Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Toy Datasets #12

Open
amc-corey-cox opened this issue Jan 24, 2025 · 2 comments
Open

Create Toy Datasets #12

amc-corey-cox opened this issue Jan 24, 2025 · 2 comments
Assignees
Labels

Comments

@amc-corey-cox
Copy link
Collaborator

These should be very simple ~10 column x 10 row datasets that we can simply use to make sure we can run the tools. We might continue to use them later as test cases or something else but that is a later concern. Right now, we just want something right away that we can use for testing.

@amc-corey-cox amc-corey-cox linked a pull request Jan 31, 2025 that will close this issue
@amc-corey-cox
Copy link
Collaborator Author

Okay, a few things in creating a quick 'toy data' set to get something going to start work on.

  • Toy data needs to be ~100 rows, some of our tools need that many to use their more valuable inference (i.e. Schema-Automator enums).
  • I'd like to have at least the basic things we need for the BDC model to build it's required objects (study/participant required info)
  • Ideally, we'll be growing this limited toy data set in two directions
    • First, a better toy data set with coverage of all the main features of our tool-chain but not much else.
    • Also, a true synthetic data set that reflects the kind of data that we actually expect to see - this would then be our synthetic data set, no longer a toy.

@amc-corey-cox
Copy link
Collaborator Author

Our most reasonable starting point for anything at all is to use the BDC synthetic data set made available to us on BDC and prune it down/add what we need.

@amc-corey-cox amc-corey-cox changed the title Create Toy Datasets for each step of the ingest Create Toy Datasets Feb 21, 2025
@amc-corey-cox amc-corey-cox removed a link to a pull request Feb 21, 2025
@amc-corey-cox amc-corey-cox self-assigned this Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant