Conversation
…#382) When every record fails during generation (e.g. LLM or image model errors), the empty dataset previously reached the profiler, which raised a misleading DatasetProfilerConfigurationError about missing columns. Now both preview() and create() check for an empty dataset before profiling and raise a DataDesignerGenerationError with an actionable message instead. Made-with: Cursor
Greptile SummaryThis PR fixes a poor user experience where all records being dropped during generation (due to LLM or image model errors) caused an empty dataset to propagate to the profiler, which then raised a misleading Key changes:
|
| Filename | Overview |
|---|---|
| packages/data-designer/src/data_designer/interface/data_designer.py | Adds early empty-dataset guards in both create() and preview() with dedicated try/except for load_dataset_with_dropped_columns(); removes now-redundant len(processed_dataset) > 0 guard from the success-log condition. |
| packages/data-designer/tests/interface/test_data_designer.py | Adds three new error-path tests: empty DataFrame from load_dataset_with_dropped_columns, FileNotFoundError from same, and empty DataFrame from process_preview; tests for create() run the real sampler build pipeline before hitting the patch. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[create / preview called] --> B[builder.build / build_preview + process_preview]
B -- Exception --> C[raise DataDesignerGenerationError\n'Error generating dataset']
B -- Success --> D{create or preview?}
D -- create --> E[load_dataset_with_dropped_columns]
E -- Exception --> F[raise DataDesignerGenerationError\n'Failed to load generated dataset']
E -- Empty DataFrame --> G[raise DataDesignerGenerationError\n'Dataset is empty']
E -- Non-empty DataFrame --> H[profiler.profile_dataset]
D -- preview --> I{len processed_dataset == 0?}
I -- Yes --> J[raise DataDesignerGenerationError\n'Dataset is empty']
I -- No --> H
H -- Exception --> K[raise DataDesignerProfilingError]
H -- Success --> L[Return Results]
Last reviewed commit: e686dbb
packages/data-designer/src/data_designer/interface/data_designer.py
Outdated
Show resolved
Hide resolved
Moves the call back inside the try/except block so ArtifactStorageError is caught and re-raised as DataDesignerProfilingError, preserving the documented API contract. Also reduces test num_records to 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nit: I think |
andreatgretel
left a comment
There was a problem hiding this comment.
Just a couple of nits, nothing blocking. Thanks for addressing the create() case too!
cleaned up in e686dbb |
Summary
DatasetProfilerConfigurationErrorabout missing columns.preview()andcreate()now check for an empty dataset before profiling and raise aDataDesignerGenerationErrorwith an actionable message instead.Closes #382
Test plan
test_create_raises_generation_error_when_dataset_is_empty— verifiescreate()raisesDataDesignerGenerationErrorwhen all records are droppedtest_preview_raises_generation_error_when_dataset_is_empty— verifiespreview()raisesDataDesignerGenerationErrorwhen all records are droppedmake test)closes #382