Replies: 1 comment
-
|
If you want something nearly identical to the previous synthetic dataset behavior, I suggest taking a look at |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'd like to ask the current and intended behavior regarding the
sourceparameter for synthetic datasets:Question
The documentation accepts a
sourceparameter, meaning that synthetic prompts can be generated based on custom source text (file or URL). However, in guidellm 0.5.x, the prompt generation for synthetic datasets only uses the Faker library, and thesourceparameter appears to be ignored.source?Additional context
_create_promptmethod (src/guidellm/data/deserializers/synthetic.py) does not useconfig.sourceat all.src/guidellm/utils/text.pycontains anEndlessTextCreatorclass that appears designed for this purpose.TestSyntheticDatasetConfig(tests/unit/data/deserializers/test_synthetic.py), the source param is present.Would appreciate clarification on:
sourceparameter as a text source for synthetic prompts intended? If not a feature anymore, would a PR to restore/implement this be welcome?Thanks for your input 😊
Beta Was this translation helpful? Give feedback.
All reactions