igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 32 additions & 0 deletions b/‎README.md‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎ROADMAP.md‎
Lines changed: 2 additions & 2 deletions b/‎ROADMAP.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎diff_diff/__init__.py‎
Lines changed: 16 additions & 0 deletions b/‎diff_diff/__init__.py‎
Lines changed: 16 additions & 0 deletions
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+- **`BusinessReport` and `DiagnosticReport` (experimental preview)** - practitioner-ready output layer. `BusinessReport(results, ...)` produces plain-English narrative summaries (`.summary()`, `.full_report()`, `.export_markdown()`, `.to_dict()`) from any of the 16 fitted result types. `DiagnosticReport(results, ...)` orchestrates the existing diagnostic battery (parallel trends, pre-trends power, HonestDiD sensitivity, Goodman-Bacon, heterogeneity, design-effect, EPV) plus estimator-native diagnostics for SyntheticDiD (`pre_treatment_fit`, weight concentration, in-time placebo, zeta sensitivity) and TROP (factor-model fit metrics). Both classes expose an AI-legible `to_dict()` schema (single source of truth; prose renders from the dict). BR auto-constructs DR by default so summaries mention pre-trends, robustness, and design-effect findings in one call. See `docs/methodology/REPORTING.md` for methodology deviations including the no-traffic-light-gates decision, pre-trends verdict thresholds (0.05 / 0.30), and power-aware phrasing driven by `compute_pretrends_power`. **Both schemas are marked experimental in this release** - wording, verdict thresholds, and schema shape will change; do not anchor downstream tooling on them yet.
+
 ### Changed
 - Add Zenodo DOI badge to README; upgrade the BibTeX citation block with the concept DOI (`10.5281/zenodo.19646175`) and list author as Isaac Gerber (matching `CITATION.cff`). Add `doi:` and `identifiers:` entries (concept + versioned) to `CITATION.cff`. DOI was minted by Zenodo when v3.1.3 was released.
 - **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`).
 
@@ -93,6 +93,38 @@ Measuring campaign lift? Evaluating a product launch? diff-diff handles the caus
 - **[Brand awareness survey tutorial](docs/tutorials/17_brand_awareness_survey.ipynb)** - Full example with complex survey design, brand funnel analysis, and staggered rollouts
 - **Have BRFSS/ACS/CPS individual records?** Use [`aggregate_survey()`](docs/api/prep.rst) to roll respondent-level microdata into a geographic-period panel with inverse-variance precision weights. The returned second-stage design uses analytic weights (`aweight`), so it works directly with `DifferenceInDifferences`, `TwoWayFixedEffects`, `MultiPeriodDiD`, `SunAbraham`, `ContinuousDiD`, and `EfficientDiD` (estimators marked **Full** in the [survey support matrix](docs/choosing_estimator.rst))
 
+### Experimental preview: `BusinessReport` and `DiagnosticReport`
+
+diff-diff ships two preview classes, `BusinessReport` and `DiagnosticReport`, that produce plain-English output and a structured `to_dict()` schema from any fitted result. **Both are experimental in this release** — wording, verdict thresholds, and schema shape will change as the library learns from real practitioner usage. Do not anchor downstream tooling on the schema yet; the experimental flag is noted in the CHANGELOG.
+
+```python
+from diff_diff import CallawaySantAnna, BusinessReport
+
+cs = CallawaySantAnna(base_period="universal").fit(
+    df, outcome="revenue", unit="store", time="month",
+    first_treat="first_treat", aggregate="event_study",
+)
+report = BusinessReport(
+    cs,
+    outcome_label="Revenue per store",
+    outcome_unit="$",
+    business_question="Did the loyalty program lift revenue?",
+    treatment_label="the loyalty program",
+    # Optional: pass the panel + column names so the auto-constructed
+    # DiagnosticReport can run data-dependent checks (2x2 pre-trends,
+    # Goodman-Bacon decomposition, EfficientDiD Hausman pretest).
+    # Without these the auto path still runs but skips those checks.
+    data=df,
+    outcome="revenue",
+    unit="store",
+    time="month",
+    first_treat="first_treat",
+)
+print(report.summary())
+```
+
+`BusinessReport` auto-constructs a `DiagnosticReport` so the summary mentions pre-trends, sensitivity, and design-effect findings in one call. Methodology (phrasing rules, verdict thresholds, schema stability) is documented in [docs/methodology/REPORTING.md](docs/methodology/REPORTING.md). Feedback on wording, applicability, and missing diagnostics is welcome — this is the part of the library most likely to evolve in the next few releases.
+
 Already know DiD? The [academic quickstart](docs/quickstart.rst) and [estimator guide](docs/choosing_estimator.rst) cover the full technical details.
 
 ## Features
 
@@ -57,6 +57,7 @@ See [Survey Design Support](docs/choosing_estimator.rst#survey-design-support) f
 
 Major landings since the prior roadmap revision. See [CHANGELOG.md](CHANGELOG.md) for the full history.
 
+- **`BusinessReport` and `DiagnosticReport`** - practitioner-ready output layer. Plain-English stakeholder summaries + unified diagnostic runner with a stable AI-legible `to_dict()` schema. `BusinessReport` auto-constructs `DiagnosticReport` by default so summaries mention pre-trends, robustness, and design-effect findings in one call. Estimator-native validation surfaces are routed through: SyntheticDiD uses `pre_treatment_fit` / `in_time_placebo` / `sensitivity_to_zeta_omega`; EfficientDiD uses its native `hausman_pretest`; TROP exposes factor-model fit metrics. See `docs/methodology/REPORTING.md` for methodology deviations including no-traffic-light gates, pre-trends verdict thresholds, and power-aware phrasing.
 - **ChaisemartinDHaultfoeuille (dCDH)** - full feature set: `DID_M` contemporaneous-switch, multi-horizon `DID_l` event study, analytical SE, multiplier bootstrap, TWFE decomposition diagnostic, dynamic placebos, normalized estimator, cost-benefit aggregate, sup-t bands, covariate adjustment (`DID^X`), group-specific linear trends (`DID^{fd}`), state-set-specific trends, heterogeneity testing, non-binary treatment, HonestDiD integration, and survey support (TSL + pweight).
 - **SyntheticDiD jackknife variance** (`variance_method='jackknife'`) with survey-weighted jackknife.
 - **SyntheticDiD validation diagnostics**.
@@ -78,8 +79,7 @@ Queued work, ordered by expected leverage. Each item is its own PR. Ordering is
 
 ### Practitioner-ready output
 
-- **`BusinessReport` class.** Plain-English summaries of any estimator's results with markdown export. Optional rich formatting via a `[reporting]` extra; core remains numpy/pandas/scipy only. Turns raw coefficients into stakeholder-ready artifacts.
-- **`DiagnosticReport` with context-aware `practitioner_next_steps()`.** Unified diagnostic runner that bundles parallel-trends, placebo, HonestDiD, Bacon decomposition, DEFF, EPV, and power diagnostics into one plain-English report. `practitioner_next_steps()` substitutes actual column names from fitted results instead of generic placeholders.
+- **Context-aware `practitioner_next_steps()`.** Substitutes actual column names from fitted results instead of generic placeholders, so next-step guidance is executable rather than illustrative. (Standalone follow-up to the `BusinessReport` / `DiagnosticReport` landing below; tracked under the AI-Agent Track too.)
 
 ### Practitioner tutorials
 
 
@@ -213,6 +213,16 @@
     plot_synth_weights,
 )
 from diff_diff.practitioner import practitioner_next_steps
+from diff_diff.business_report import (
+    BUSINESS_REPORT_SCHEMA_VERSION,
+    BusinessContext,
+    BusinessReport,
+)
+from diff_diff.diagnostic_report import (
+    DIAGNOSTIC_REPORT_SCHEMA_VERSION,
+    DiagnosticReport,
+    DiagnosticReportResults,
+)
 from diff_diff._guides_api import get_llm_guide
 from diff_diff.datasets import (
     clear_cache,
@@ -427,6 +437,12 @@
     "clear_cache",
     # Practitioner guidance
     "practitioner_next_steps",
+    "BusinessReport",
+    "BusinessContext",
+    "BUSINESS_REPORT_SCHEMA_VERSION",
+    "DiagnosticReport",
+    "DiagnosticReportResults",
+    "DIAGNOSTIC_REPORT_SCHEMA_VERSION",
     # LLM guide accessor
     "get_llm_guide",
 ]