Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a24f119
Merge content from main repo using Macro include_files
e-lo Sep 26, 2022
74b02d9
[ actually include the macros ]
e-lo Sep 26, 2022
9db30a9
Add Examples
e-lo Sep 28, 2022
7fdddfe
first hack at creating random example data
e-lo Sep 28, 2022
dba9e26
Initial example data.
e-lo Oct 7, 2022
eaa009a
Update Based on Comments
e-lo Oct 7, 2022
af15c9a
update validate-data workflow to match datapackage.json
e-lo Oct 7, 2022
c0acbd1
Merge branch 'main' into issue-40-validate-example-data
e-lo Oct 7, 2022
bcc5a0f
Merge upstream into issue-40-validate-example-data
e-lo Oct 7, 2022
3532b15
Clean up merge fails
e-lo Oct 7, 2022
53a05b9
Responses to comments
e-lo Oct 7, 2022
3aadfea
Fix validate CLI call
e-lo Oct 10, 2022
9661da0
Responses to comments
e-lo Oct 10, 2022
5332619
Removed pandas from markdown table writing
e-lo Oct 10, 2022
e05cd3c
pre-commit
e-lo Oct 10, 2022
9d50d5d
merging upstream changes
e-lo Jul 12, 2023
06a8294
lint
e-lo Jul 13, 2023
102e1c3
update/simply script for creating template files
e-lo Jul 13, 2023
d7a9756
add recommended fields
e-lo Jul 13, 2023
446ec34
Add datapackage documentation page
e-lo Jul 13, 2023
5088e6c
Merge remote-tracking branch 'upstream/main' into issue-40-validate-e…
e-lo Jul 13, 2023
ec3c7e8
add local validation scripts
e-lo Jul 14, 2023
1b94140
Merge remote-tracking branch 'origin/main' into pr/75
e-lo Sep 20, 2023
9d8f05f
pep8/precommit
e-lo Sep 20, 2023
88e19b0
came back - re-deleting
e-lo Sep 20, 2023
cb6dd18
bug/typo fixes
e-lo Dec 4, 2023
5bc9310
Merge remote-tracking branch 'origin/main' into pr/75
e-lo Dec 4, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .github/workflows/validate-data.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Validate-Samples

on:
push:
paths:
- 'data/*/TIDES/*'
- 'spec/*'
pull_request:
paths:
- 'data/*/TIDES/*'
- 'spec/*'
workflow_dispatch:
create:

jobs:
validate:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: Validate data
uses: frictionlessdata/repository@v2
with:
packages: "data/*/TIDES/datapackage.json"
11 changes: 9 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
.DS_Store*
.vscode*
/__pycache__/*
.env
/venv*
/site
/__pycache__
/site/*
# pages that are copied in from main repo
/docs/CONTRIBUTING.md
/docs/CODE_OF_CONDUCT.md
/docs/README.md
# pages that are generated from templates
/docs/tables.md
/docs/architecture.md
4 changes: 4 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ MD041: false
# Remove line length limit
MD013: false

# Remove "fenced code blocks should have language specified"
MD040: false

# Remove "no inline HTML"
# Remove code block style because it also fails on admonition
MD046: false

Expand Down
24 changes: 16 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,6 @@ Directories with TIDES data must contain metadata in a [`datapackage.json`][tide

[`/samples/template/datapackage.json`][template-datapackage] has a template datapackage which can be used.

## Sample Data

[Sample data][samples] can be found in the `/samples` directory, with one directory for each sample.

### Template

Templates of `datapackage.json` and each TIDES file type are located in the `/samples/template` directory.

## Validating TIDES data

TIDES data with a valid [`datapackage.json`](#data-package) can be easily validated using the [frictionless framework], which can be installed and invoked as follows:
Expand All @@ -53,6 +45,22 @@ Several other validation scripts and tools with more flexibility such as validat
bin/validate-datapackage [-v remote_spec_ref | -l local_spec_path] [-d dataset_path]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 5 scripts in bin/ and the usage information isn't always clear about why you'd use one script or another, in addition, the names are all very similar. I think it would help a lot to document what the purpose of each script is here.

```

### Specific files

Specific files can be validated by running the frictionless framework against them and their corresponding schemas as follows:

```sh
frictionless validate vehicles.csv --schema https://raw.githubusercontent.com/TIDES-transit/TIDES/main/spec/schema.vehicles.json --schema-sync
```

## Sample Data

[Sample data](https://tides-transit.github.io/TIDES/main/samples) can be found in the `/samples` directory, with one directory for each sample.

### Template

Templates of `datapackage.json` and each TIDES file type are located in the `/samples/template` directory. They can be used to build out TIDES data, particuarly samples. Most TIDES data in practice will be directly produced as an output from software or scripts.

## Contributing to TIDES

Those who want to help with the development of the TIDES specification should review the guidance in [contributing].
Expand Down
4 changes: 2 additions & 2 deletions bin/replace-spec-in-datapackage
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Arguments:
"

example_usage="
Example Usage:
Example Usage:
bin/replace-spec-in-datapackage samples/template/TIDES spec samples/template/TIDES/datapackage.tmp.json
"

Expand Down Expand Up @@ -95,4 +95,4 @@ jq --arg spec_path_prefix "$spec_path_prefix" --arg profile_file "$profile_file"
| .profile = ($profile_file)
' "$output_file" > "$output_file.tmp" && mv "$output_file.tmp" "$output_file"

echo "$output_file"
echo "$output_file"
12 changes: 6 additions & 6 deletions bin/utils
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@

# Check if jsonschema-cli is installed
check_jsonschema-cli() {
if ! command -v jsonschema-cli >/dev/null 2>&1; then
echo >&2 "\033[31m!!! jsonschema-cli is required but not found.\033[0m
if ! command -v jsonschema-cli >/dev/null 2>&1; then
echo >&2 "\033[31m!!! jsonschema-cli is required but not found.\033[0m

You can install it using 'pip install jsonschema-cli'. Aborting."
exit 1
fi
}

# Check if frictionless is installed
check_frictionless() {
if ! command -v frictionless >/dev/null 2>&1; then
echo >&2 "\033[31m!!! frictionless is required but not found.\033[0m
if ! command -v frictionless >/dev/null 2>&1; then
echo >&2 "\033[31m!!! frictionless is required but not found.\033[0m

You can install it using 'pip install frictionless'. Aborting."
exit 1
fi
Expand Down
68 changes: 68 additions & 0 deletions bin/validate-data-package
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • this file is not marked executable
  • how does it differ from bin/validate-datapackage, should it replace it?

Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/usr/bin/env bash

# Script: validate_data_package
# Description: Bash script to validate a Frictionless Data Package using the Frictionless CLI.
# Usage: validate_data_package [-v tides_version | -l local_schema_location] [-d dataset_location]
# -v tides_version: Optional. Specify the version of the TIDES specification or 'local' to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this parameter necessary? Shouldn't this always use the schema specified in the datapackage or treat -l as an override? In other words, if -l is provided, override the datapackage, otherwise use the datapackage, no need for -v.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, it seems like -l should be -s, which lets you set any schema location, overriding any datapackage. It wouldn't need to be local, could also be a GitHub location or NFS or samba, &c.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the samples/template/TIDES doesn't validate, should it?

# use a local schema. Default is to use the schema specified in the datapackage.
# -l local_schema_location: Optional. Specify the location of the local schema directory.
# Default is '../spec'. Is only used if tides_version = local.
# -d dataset_location: Optional. Specify the location of the TIDES datapackage.json.
# Default is the current directory.

# Set default values
tides_version=""
local_schema_location="../spec"
dataset_location="."

# Parse command-line arguments
while getopts ":v:l:d:" opt; do
case $opt in
v)
tides_version=$OPTARG
;;
l)
local_schema_location=$OPTARG
;;
d)
dataset_location=$OPTARG
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
esac
done

# Create a temporary data package if using a different schema reference or a local schema
tmp_datapackage=""
if [ "$tides_version" != "" ]; then
tmp_datapackage=$(mktemp)
cp "$dataset_location/datapackage.json" "$tmp_datapackage"
fi

# Set the schema URL based on the option chosen
schema_url=""
if [ "$tides_version" == "local" ]; then
schema_path_prefix="$local_schema_location"
else
schema_path_prefix="https://raw.githubusercontent.com/TIDES-transit/TIDES/$tides_version/spec"
fi

# Update the 'schema' property in the temporary copy of the datapackage.json file, if applicable
if [ "$tmp_datapackage" != "" ]; then
schema_file=$(echo "$tmp_datapackage" | sed 's/\//\\\//g')
sed -E -i "s/\"schema\": \"[^\/]+\.schema\.json\"/\"schema\": \"$schema_path_prefix\/\${schema_file##*\/}\"/g" "$tmp_datapackage"
dataset_location="$tmp_datapackage"
fi

# Validate the data package JSON against the TIDES schema
./validate-data-package-json -v "$tides_version" -f "$dataset_location" -l "$local_schema_location"

# Validate the Frictionless Data Package using the Frictionless CLI
frictionless validate "$dataset_location" --schema-sync

# Remove the temporary data package file, if applicable
if [ "$tmp_datapackage" != "" ]; then
rm "$tmp_datapackage"
fi
64 changes: 64 additions & 0 deletions bin/validate-data-package-json
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/usr/bin/env bash

# Script to validate a local JSON file against a schema specified in a GitHub repository.
# Usage: validate-data-package-json [-r ref | -l local_schema_location] [-f datapackage_file]
# -r ref: Optional. Specify the ref name of the GitHub repository. Default is 'main'.
# -l local_schema_location: Optional. Specify the location of the local schema directory.
# -f datapackage_file: Optional. Specify the location of the datapackage.json file. Default is 'datapackage.json' in the execution directory.

# Check if jsonschema-cli is installed
command -v jsonschema-cli >/dev/null 2>&1 || {
echo >&2 "jsonschema-cli is required but not found. You can install it using 'pip install jsonschema-cli'. Aborting."
exit 1
}

# Set default values
ref="main"
local_schema_location=""
datapackage_file="datapackage.json"

# Parse command-line arguments
while getopts ":r:l:f:" opt; do
case $opt in
r)
ref=$OPTARG
;;
l)
local_schema_location=$OPTARG
;;
f)
datapackage_file=$OPTARG
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
esac
done

echo "Validating data package file in $dataset_location"

# Set the temporary directory path
temp_dir=$(mktemp -d)

# Set the schema file path based on the option chosen
schema_file=""
if [ "$local_schema_location" != "" ]; then
schema_file="$local_schema_location/tides-data-package.json"
else
# Download the schema file to the temporary directory
schema_url="https://raw.githubusercontent.com/TIDES-transit/TIDES/$ref/spec/tides-data-package.json"
schema_file="$temp_dir/data-package.json"

if curl -s --head "$schema_url/tides-data-package.json" >/dev/null; then
echo "Schema file not found on GitHub for the specified TIDES version: $tides_version"
exit 1
fi
curl -o "$schema_file" "$schema_url"
fi

# Validate datapackage against the downloaded schema
jsonschema-cli validate "$schema_file" "$datapackage_file"

# Clean up the temporary directory
rm -rf "$temp_dir"
8 changes: 4 additions & 4 deletions bin/validate-datapackage-to-profile
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script and validate-data-package-json use -f but validate-data-package uses -d, they should probably use the same parameters for the same inputs

Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
#!/usr/bin/env bash

description="Script to validate a local JSON file against a profile for tides-data-package specified in
description="Script to validate a local JSON file against a profile for tides-data-package specified in
profile field or optionally against a remote or local profile."

usage="
Usage: validate-datapackage-to-profile [-r remote_spec_ref | -l local_spec_path] [-f datapackage_file]
-r remote_spec_ref: Optional. Specify the ref name of the GitHub repository for validating agianst
a remote profile where the profile is in the sub-path /spec/tides-data-package.json.
a remote profile where the profile is in the sub-path /spec/tides-data-package.json.
Should not be used with -l option. Example: -r main
-l local_spec_path: Optional. Specify the location of the local tides-data-package-json to use.
Should not be used with -r option. Example: -l spec
-d dataset_path: Optional. Specify the path of the datapackage.json file.
-d dataset_path: Optional. Specify the path of the datapackage.json file.
Comment on lines 7 to +13
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage line says -f, but help and parsing use -d

Default is datapackage.json. Example: -d samples/template/TIDES/datapackage.json
"

Expand Down Expand Up @@ -97,7 +97,7 @@ fi
if [ -f "$dataset_path" ]; then
datapackage_file="$dataset_path"
dataset_path=$(dirname "$dataset_path")
else
else
datapackage_file="$dataset_path/datapackage.json"
fi
check_valid_path "$datapackage_file"
Expand Down
4 changes: 3 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
{{ include_file('README.md', downshift_h1= False) }}
# TIDES Transit Specification Suite

{{ include_file('README.md', start_line = 2, downshift_h1= False) }}
2 changes: 1 addition & 1 deletion tests/test_samples_to_canonical
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ if output=$("$script" $args 2>&1); then
else
printf "\033[31m!!! %s encountered an error.\033[0m\n" "$script"
echo "$output" >&2
fi
fi
2 changes: 1 addition & 1 deletion tests/test_samples_to_local
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ if output=$("$script" $args 2>&1); then
else
printf "\033[31m!!! %s encountered an error.\033[0m\n" "$script"
echo "$output" >&2
fi
fi