Skip to content

Add docs for data/datasets and how to configure them in GuideLLM #137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

markurtz
Copy link
Member

@markurtz markurtz commented Apr 25, 2025

Fixes #133

Copy link

Build artifacts (.whl and .tar.gz) are available for download for up to 30 days.
They are located at https://github.com/neuralmagic/guidellm/actions/runs/14657660273/artifacts/3007230813

Copy link

Build artifacts (.whl and .tar.gz) are available for download for up to 30 days.
They are located at https://github.com/neuralmagic/guidellm/actions/runs/14657675243/artifacts/3007234607

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new documentation page for configuring data and datasets in GuideLLM, and updates the README to include a link to the new guide.

  • Introduces detailed guidance on dataset configurations, examples, and usage.
  • Updates the main README to reference the new Data/Datasets Guide.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
docs/datasets.md New documentation outlining dataset configuration and usage.
README.md Updated navigation to include the new Data/Datasets guide.

Copy link

Build artifacts (.whl and .tar.gz) are available for download for up to 30 days.
They are located at https://github.com/neuralmagic/guidellm/actions/runs/14657701691/artifacts/3007241591

@SharonGil
Copy link

@markurtz Thank you very much for this, this simplifies and makes things much clearer.

2 small comments from my end:

  1. In lines 27 and 142 - you put suffix .ext - just wanted to make sure this is intentionally as a place holder for some external file or a typo instead of .txt.
  2. As of the data formats themselves, its written "Ensure the file format matches the expected structure for the dataset" - for HF model maybe to add a short elaboration regarding the 'dataset' library format expected, in order to understand which of HF models are supported and can be used and which aren't, rather than just trial and error...
    Also for the file based DS format - there are few examples there for .txt, .csv and .json files - do these represent the only formats supported for these files? Meaning - for .txt files the data should be plain newline delimited strings that represent the prompts without and keys added and without output, but for .csv for example you have to give a prompt and an output columns?

@markurtz
Copy link
Member Author

@markurtz Thank you very much for this, this simplifies and makes things much clearer.

2 small comments from my end:

  1. In lines 27 and 142 - you put suffix .ext - just wanted to make sure this is intentionally as a place holder for some external file or a typo instead of .txt.
  2. As of the data formats themselves, its written "Ensure the file format matches the expected structure for the dataset" - for HF model maybe to add a short elaboration regarding the 'dataset' library format expected, in order to understand which of HF models are supported and can be used and which aren't, rather than just trial and error...
    Also for the file based DS format - there are few examples there for .txt, .csv and .json files - do these represent the only formats supported for these files? Meaning - for .txt files the data should be plain newline delimited strings that represent the prompts without and keys added and without output, but for .csv for example you have to give a prompt and an output columns?

@SharonGil take a look through again now, I've updated based on this feedback to make the points you raised clearer and try and answer your questions

Copy link

Build artifacts (.whl and .tar.gz) are available for download for up to 30 days.
They are located at https://github.com/neuralmagic/guidellm/actions/runs/14669119659/artifacts/3011014183

@SharonGil
Copy link

SharonGil commented Apr 25, 2025

@markurtz That's perfect. Thanks a lot for the clarifications!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Documentation regarding the DS format to be fed to GuideLLM
2 participants