fix: addressed holdout=0 for generation#343
Merged
seayang-nv merged 3 commits intomainfrom Apr 3, 2026
Merged
Conversation
Signed-off-by: Sean Yang <seayang@nvidia.com>
nina-xu
previously approved these changes
Apr 2, 2026
Signed-off-by: Sean Yang <seayang@nvidia.com>
kendrickb-nvidia
approved these changes
Apr 3, 2026
mckornfield
approved these changes
Apr 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Root Cause: When
holdout=0,process_data()correctly produces no test set(_test_df = None), but it still creates an empty 0-bytetest.csvviatouch(). On resume,load_from_save_path()sees the file exists and unconditionally callspd.read_csv()on it, which raisesEmptyDataErrorbecause the file has no content.Changes:
library_builder.py--process_data(): Removed the else:touch()branch so notest.csvis written when there is no holdout set. This prevents the empty file from being created in the first place.library_builder.py--load_from_save_path(): Changed the loading condition to only requiretraining.csv. Ifest.csvis missing or empty (0 bytes),_test_dfis set to None instead of attemptingpd.read_csv(). This fixes the crash on resume and provides backward compatibility with saved runs that already have an emptytest.csvon disk.cli/utils.py: Relaxed the CLI resume validation to only requiretraining.csvto exist, sincetest.csvis legitimately absent whenholdout=0.tests/sdk/test_process_data.py: Added three tests covering the fix -- notest.csvwritten when holdout is zero, successful resume withouttest.csv, and backward-compat handling of emptytest.csvfrom older runs.Pre-Review Checklist
Ensure that the following pass:
make format && make checkor via prek validation.make testpasses locallymake test-e2epasses locallymake test-ci-containerpasses locally (recommended)/syncon this PR to trigger a run (auto-triggers on ready-for-review)Ran e2e on
shoppers.csvon the following config:safe-synthesizer run --data-source data/shoppers.csv --config safe-synthesizer-config.yamlthen ran only generation with the trained adapter above
safe-synthesizer run generate --data-source data/shoppers.csv --config safe-synthesizer-config.yaml --run-path safe-synthesizer-artifacts/safe-synthesizer-config---shoppers/2026-04-02T16\:27\:08/Pre-Merge Checklist
Other Notes
run generateerrors when the original training job has holdout = 0 #276