Datetime sampler column strips out data when generating small datasets

### Priority Level

Medium (Annoying but has workaround)

### Describe the bug

`DatetimeFormatMixin.postproc` uses a heuristic cascade to auto-detect the output format based on the variability of sampled values. The first data-dependent branch checks `series.dt.month.nunique() == 1` and, when true, returns only the year:

```python
# data_designer/engine/sampling_gen/data_sources/base.py, lines 101-102
if series.dt.month.nunique() == 1:
    return series.apply(lambda dt: dt.year).astype(str)
```

With `num_records=1`, `nunique()` is always 1 regardless of the actual sampled datetime, so the output is e.g. `'2026'` instead of `'2026-03-15T10:00:00'`. This also triggers for any batch where all records happen to land in the same calendar month.

The bare year string breaks `datetime.fromisoformat()` and any downstream code that expects an ISO-8601 timestamp.

**Workaround:** Set `convert_to="%Y-%m-%dT%H:%M:%S"` on the `SamplerColumnConfig` to bypass the heuristic entirely.

**Suggested fix:** The heuristic branches in `DatetimeFormatMixin.postproc` (lines 101-108) are fragile for small sample sizes. Consider defaulting to ISO-8601 output when `convert_to` is not set, or at minimum requiring `len(series) > 1` before applying the adaptive formatting.

### Steps/Code to reproduce bug

Here's a small script that demonstrates the behavior

```
import data_designer.config as dd
from data_designer.interface import DataDesigner

builder = dd.DataDesignerConfigBuilder()
builder.add_column(
    dd.SamplerColumnConfig(
        name="ts",
        sampler_type=dd.SamplerType.DATETIME,
        params=dd.DatetimeSamplerParams(
            start="2024-01-01",
            end="2026-06-30",
            unit="h",
        ),
    ),
)

designer = DataDesigner()
result = designer.preview(builder, num_records=1)

ts_value = result.dataset["ts"].iloc[0]
print(f"ts value: {ts_value!r}")

# With 1 record, month.nunique() == 1, so postproc returns just the year.
assert len(ts_value) == 4, f"Expected bare year, got {ts_value!r}"
print(f"\nBug confirmed: DATETIME sampler returned bare year '{ts_value}' for a single-record preview.")
print("❌ This breaks datetime.fromisoformat() and any downstream ISO-8601 parsing.")

print("\n" + "-" * 80 + "\n")

# With 2+ records that span different months, postproc falls through to isoformat.
result_multi = designer.preview(builder, num_records=10)
values = result_multi.dataset["ts"].tolist()
print(f"\nWith 10 records: {values[:3]} ...")
assert any(len(v) > 4 for v in values), "Expected full ISO strings with multiple records"
print("✅ With enough records the months vary, so postproc returns full ISO strings.")
```

### Expected behavior

I would expect that we always return either (1) a Python `datetime` object or (2) an ISO-8601 formatted timestamp string. When the `postproc` strips away elements of the datetime string, it breaks downstream parsers.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datetime sampler column strips out data when generating small datasets #484

Priority Level

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Datetime sampler column strips out data when generating small datasets #484

Description

Priority Level

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions