Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create LinkML files for all Toy data #55

Open
5 tasks
twhetzel opened this issue Mar 13, 2025 · 6 comments
Open
5 tasks

Create LinkML files for all Toy data #55

twhetzel opened this issue Mar 13, 2025 · 6 comments
Assignees
Labels
Data Transformation Data transformation

Comments

@twhetzel
Copy link
Collaborator

twhetzel commented Mar 13, 2025

Given the toy data in toy_data/initial of:

  • demographics.tsv
  • lab_results.tsv
  • sample.tsv
  • study.tsv
  • subject.tsv

Create LinkML using SchemaAutomator for each file. It is ok to have each of these as separate model files.
The documentation for SchemaAutomator should be updated with any new information needed.

@twhetzel twhetzel self-assigned this Mar 13, 2025
@twhetzel twhetzel added the Data Transformation Data transformation label Mar 13, 2025
@twhetzel
Copy link
Collaborator Author

Based on the documentation, there are two possible commands:

  • Run on each data file individually using generalize-tsv as:
schemauto generalize-tsv --schema-name Demographics ../initial/demographics.tsv -o Demographics.yml

and

  • Run on all data files using generalize-tsvs as:
schemauto generalize-tsvs --schema-name Demographics ../initial/demographics.tsv --schema-name LabResults ../initial/lab_results.tsv --schema-name Sample ../initial/sample.tsv --schema-name Study ../initial/study.tsv --schema-name Subject ../initial/subject.tsv -o toy_data-all.yml

@amc-corey-cox
Copy link
Collaborator

amc-corey-cox commented Mar 19, 2025

Trish, I haven't looked deeply into this so you likely know more than I do but I was able to build it all into one schema using this:

schemauto generalize-tsvs ../dm-bip/toy_data/initial/*

Now, I don't know if that is a good schema or if there are reasons we would want this to be a separate schema for each file but my sense is that having it all as one single schema is probably more flexible for different data sets so we don't have to rely on manually specifying schema names.

@amc-corey-cox
Copy link
Collaborator

We do probably want to name the schema when we make it... like this.

schemauto generalize-tsvs --schema-name Toy_Schema ../dm-bip/toy_data/initial/*

@twhetzel
Copy link
Collaborator Author

twhetzel commented Mar 19, 2025

Yes, I imagine that schemauto generalize-tsvs ../dm-bip/toy_data/initial/* also works, but gather the schema-name arg might be useful. There are some discrepancies between the web docs, cli docs, and what commands actually work so I've been trying out different things to try to understand what works and how commands are intended to be used.

Having one model file is fine with me and a question I wanted to run by you.

@amc-corey-cox
Copy link
Collaborator

What I try to keep in mind is that ideally we'll have essentially no human interaction in this. So we want to be able to put the data somewhere, target that location and have it do everything, perhaps with a variable to say what dataset we're working on.

@twhetzel
Copy link
Collaborator Author

for the conversion, yes, no human interaction, but then a human will need to review the file(s) that are generated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Transformation Data transformation
Projects
None yet
Development

No branches or pull requests

2 participants