feat(results): add export() method and --output-format CLI flag#540
feat(results): add export() method and --output-format CLI flag#540przemekboruta wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
Conversation
Adds DatasetCreationResults.export(path, format=) supporting jsonl, csv, and parquet. The CLI create command gains --output-format / -f which writes dataset.<format> alongside the parquet batch files.
Greptile SummaryThis PR adds a
|
| Filename | Overview |
|---|---|
| packages/data-designer/src/data_designer/interface/results.py | Adds ExportFormat Literal, SUPPORTED_EXPORT_FORMATS tuple, and export() method with correct per-format DataFrame serialisation and runtime validation. |
| packages/data-designer/src/data_designer/cli/controllers/generation_controller.py | Adds output_format param with early validation (before generation) and post-generation export call; correctly threads the new parameter through the controller. |
| packages/data-designer/src/data_designer/cli/commands/create.py | Adds --output-format / -f typer option, forwarded unchanged to the controller; no flag conflicts with existing -n / -d / -o options. |
| packages/data-designer/tests/interface/test_results.py | Adds 7 parametrised and targeted tests covering all formats, default behaviour, unsupported format error, and Path return type. |
| packages/data-designer/tests/cli/commands/test_create_command.py | Existing delegation tests updated to pass output_format=None; new test verifies --output-format is forwarded to the controller. |
| packages/data-designer/tests/cli/test_main.py | Minimal update: adds output_format=None to the expected call assertion in the existing dispatch test. |
Sequence Diagram
sequenceDiagram
participant User
participant CLI as create_command (CLI)
participant Ctrl as GenerationController
participant DD as DataDesigner
participant Results as DatasetCreationResults
User->>CLI: data-designer create config.yaml -f jsonl
CLI->>Ctrl: run_create(..., output_format="jsonl")
Ctrl->>Ctrl: validate output_format in SUPPORTED_EXPORT_FORMATS
Ctrl->>DD: create(config_builder, num_records, dataset_name)
DD-->>Ctrl: DatasetCreationResults
Ctrl->>Results: load_dataset()
Results-->>Ctrl: pd.DataFrame
Ctrl->>Results: export(artifact_path/dataset.jsonl, format="jsonl")
Results->>Results: df.to_json(path, orient="records", lines=True)
Results-->>Ctrl: Path("...dataset.jsonl")
Ctrl-->>User: print "Exported to: ..."
Reviews (3): Last reviewed commit: "fix(cli): remove top-level results impor..." | Re-trigger Greptile
Summary
DatasetCreationResults.export(path, format=)supporting jsonl, csv, and parquet formats--output-format/-fflag to thedata-designer createCLI command; writesdataset.<format>alongside the parquet batch filesjsonl; the parameter is optional in both the Python API and CLIUsage
Python API:
CLI:
Test plan
test_export_writes_file— parametrized over all 3 formatstest_export_jsonl_content— each line is valid JSONtest_export_csv_content— header + data round-triptest_export_parquet_content— DataFrame round-triptest_export_default_format_is_jsonltest_export_unsupported_format_raises— raisesValueErrortest_export_returns_path_object— returnsPathfor str inputoutput_formatparameter