Create LinkML files for all Toy data #55

twhetzel · 2025-03-13T17:04:09Z

Given the toy data in toy_data/initial of:

Create LinkML using SchemaAutomator for each file. It is ok to have each of these as separate model files.
The documentation for SchemaAutomator should be updated with any new information needed.

The text was updated successfully, but these errors were encountered:

twhetzel · 2025-03-19T20:15:57Z

Based on the documentation, there are two possible commands:

Run on each data file individually using generalize-tsv as:

schemauto generalize-tsv --schema-name Demographics ../initial/demographics.tsv -o Demographics.yml

and

Run on all data files using generalize-tsvs as:

schemauto generalize-tsvs --schema-name Demographics ../initial/demographics.tsv --schema-name LabResults ../initial/lab_results.tsv --schema-name Sample ../initial/sample.tsv --schema-name Study ../initial/study.tsv --schema-name Subject ../initial/subject.tsv -o toy_data-all.yml

amc-corey-cox · 2025-03-19T20:53:20Z

Trish, I haven't looked deeply into this so you likely know more than I do but I was able to build it all into one schema using this:

schemauto generalize-tsvs ../dm-bip/toy_data/initial/*

Now, I don't know if that is a good schema or if there are reasons we would want this to be a separate schema for each file but my sense is that having it all as one single schema is probably more flexible for different data sets so we don't have to rely on manually specifying schema names.

amc-corey-cox · 2025-03-19T20:56:18Z

We do probably want to name the schema when we make it... like this.

schemauto generalize-tsvs --schema-name Toy_Schema ../dm-bip/toy_data/initial/*

twhetzel · 2025-03-19T20:56:34Z

Yes, I imagine that schemauto generalize-tsvs ../dm-bip/toy_data/initial/* also works, but gather the schema-name arg might be useful. There are some discrepancies between the web docs, cli docs, and what commands actually work so I've been trying out different things to try to understand what works and how commands are intended to be used.

Having one model file is fine with me and a question I wanted to run by you.

amc-corey-cox · 2025-03-19T21:03:13Z

What I try to keep in mind is that ideally we'll have essentially no human interaction in this. So we want to be able to put the data somewhere, target that location and have it do everything, perhaps with a variable to say what dataset we're working on.

twhetzel · 2025-03-19T21:06:21Z

for the conversion, yes, no human interaction, but then a human will need to review the file(s) that are generated

twhetzel self-assigned this Mar 13, 2025

twhetzel added the Data Transformation Data transformation label Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create LinkML files for all Toy data #55

Create LinkML files for all Toy data #55

twhetzel commented Mar 13, 2025 •

edited

Loading

twhetzel commented Mar 19, 2025

amc-corey-cox commented Mar 19, 2025 •

edited

Loading

amc-corey-cox commented Mar 19, 2025

twhetzel commented Mar 19, 2025 •

edited

Loading

amc-corey-cox commented Mar 19, 2025

twhetzel commented Mar 19, 2025

Create LinkML files for all Toy data #55

Create LinkML files for all Toy data #55

Comments

twhetzel commented Mar 13, 2025 • edited Loading

twhetzel commented Mar 19, 2025

amc-corey-cox commented Mar 19, 2025 • edited Loading

amc-corey-cox commented Mar 19, 2025

twhetzel commented Mar 19, 2025 • edited Loading

amc-corey-cox commented Mar 19, 2025

twhetzel commented Mar 19, 2025

twhetzel commented Mar 13, 2025 •

edited

Loading

amc-corey-cox commented Mar 19, 2025 •

edited

Loading

twhetzel commented Mar 19, 2025 •

edited

Loading