Skip to content

🎉 Charts as first-class citizens in ETL: layered approach with grapher#6174

Open
pabloarosado wants to merge 26 commits into
masterfrom
etl-chart-config
Open

🎉 Charts as first-class citizens in ETL: layered approach with grapher#6174
pabloarosado wants to merge 26 commits into
masterfrom
etl-chart-config

Conversation

@pabloarosado

@pabloarosado pabloarosado commented May 28, 2026

Copy link
Copy Markdown
Contributor

Written by Claude Code — @pabloarosado at the wheel.

Human summary

Lets ETL create and edit charts as zero-dimension mdim collections, using a layered approach, so that changes made from the grapher chart admin supersede the ETL config.

Depends on owid/owid-grapher#6553.
Tracking issue: #6069
This PR is the etl-side of Phase 1 in the roadmap document.


Core implementation idea

A chart is defined in etl as an export://multidim step (with a .config.yml file) with dimensions: [] (i.e. no dropdown selectors; not to be confused with the grapher config's dimensions, which lists the plotted indicators) and exactly one view. This step gets routed through Collection.upsert_to_db to the new PUT /admin/api/charts/:id/etlConfig endpoint, which stores the chart's grapher config as a separate chart_configs row, pointed to by a new charts.configIdETL column (mirroring how variables.grapherConfigIdETL already works).

The rendered config is merge(variableETL, etlConfig, patch): ETL writes the etlConfig row, admins write patch, so the two never collide and an ETL re-push can't clobber an admin edit. We also set charts.catalogPath as the chart's stable ETL identity (like multi-dims already have).

Additional changes

  • Export multidim steps can now be executed just with the config file (even without a .py file).
  • Adapted the chart-preview skill to let Claude preview draft charts (this is important for newly created charts).
  • Adapted chart-editing skill to the new approach.
  • Adapted VSCode extensions (clickable dag steps and chart preview).
  • Adapted version tracker.
  • Created step to replicate the chick-culling laws chart as an example.

How to test

  • Fetch this branch and (pointing to the corresponding staging server) ask Claude to create a new chart (on whatever topic). Use the chart preview extension to visualize it.
  • Make changes in the chart admin (e.g. edit the title).
  • Run the export step of the chart again, and check that the title is still the one manually set from the admin.
  • From the chart admin, link again the title field to etl, and check that the chart uses the etl values again.
  • (After doing the test, remove the chart and the step).

Lets ETL push standalone Grapher charts as zero-dimension mdim
collections. A `.config.yml` with `dimensions: []` and exactly one
view is routed through `Collection.upsert_to_db` to a new endpoint
that writes the chart's ETL-authored grapher config into
`chart_configs.etlConfig` (added on the grapher side).

The endpoint lives on owid-grapher:
  PUT /admin/api/charts/:id/etlConfig

Admin patches in `chart_configs.patch` are preserved across ETL
re-pushes by construction — ETL and admin write to different
columns. No drift-check apparatus, no bot-user stamping, no
ETL_FORCE_CHART. The full rendered config is recomputed server-side
as merge(variableETL, etlConfig, patch).

Contents:
- `apps/chart_sync/admin_api.py`: new `put_chart_etl_config` and
  `delete_chart_etl_config` methods.
- `etl/collection/chart_upsert.py`: new module that translates a
  zero-dim collection view into a grapher chart config (resolving
  catalog paths to variable IDs) and pushes it via the new endpoint.
  Creates the chart with a minimal bootstrap if it doesn't exist yet.
- `etl/collection/model/core.py`: `Collection.upsert_to_db` short-
  circuits to `upsert_collection_as_chart` when `dimensions` is empty.
- `etl/steps/__init__.py`: `ExportStep` can run from a `.config.yml`
  with no Python script for `export://multidim/...` paths.
- `apps/step_update/cli.py` + `etl/version_tracker.py`: recognise
  YAML-only export steps in step discovery and version tracking.
- `schemas/multidim-schema.json`: documents that `dimensions` can be
  empty for single-chart collections (the schema already allowed it
  structurally — `minItems` was never set).
- `etl/steps/export/multidim/animal_welfare/latest/banning_of_chick_culling.config.yml`
  + DAG entry: chick-culling chart as the example/test artifact.

Depends on owid-grapher PR #6553 (the endpoint and the etlConfig
column don't exist on master yet). To test end-to-end on staging:
SSH into the staging server, check out `etl-chart-config` in the
owid-grapher folder, run the migration, restart admin, then run
this ETL step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@owidbot

owidbot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Quick links (staging server):

Site Dev Site Preview Admin Wizard Docs Docs Preview

Login: ssh owid@staging-site-etl-chart-config

chart-diff: ✅ No charts for review.
data-diff: ✅ No differences found

Automatically updated datasets matching excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2026-06-09 14:51:16 UTC
Execution time: 10.48 seconds

- `Collection.upsert_to_db`: when dimensions are empty, explicitly require
  exactly one view before dispatching to chart upsert. Previously the
  assertion lived only inside `upsert_collection_as_chart`; surfacing it at
  the dispatch point makes the implicit "zero-dim ⇒ single chart" behaviour
  switch concrete with a clear error for the malformed case.

- `dag/animal_welfare.yml`: rewrite the test-scaffolding-flavoured comment
  on the chick-culling export step to a normal one-liner.

Resolves codex findings 4 and 5 from the PR review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pabloarosado pabloarosado self-assigned this May 28, 2026
pabloarosado and others added 2 commits June 1, 2026 16:27
DAG entries for `export://multidim/...` steps backed by a `.config.yml`
(single-chart or mdim collections without a `.py`) were not clickable
because the resolver only looked for `.py`, `__init__.py`, or `.ipynb`.
Add `.config.yml` to the candidate list for `export://` URIs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-chart `.config.yml` files (zero-dim collection + no sibling `.py`)
push to the chart admin endpoint and are created unpublished by default.
The public `/grapher/{slug}` route doesn't render unpublished charts, so
clicking "Preview chart" was failing for newly-created ETL-authored
charts. Detect the single-chart case and route the preview through
`/admin/grapher/{slug}` instead, which renders drafts.

`parseExportMultidim` becomes async and returns `isChart` + `chartSlug`
based on (a) `dimensions: []` in the .config.yml AND (b) no sibling
`.py` file (multi-dim steps like air_pollution use `dimensions: []` as
a placeholder filled programmatically — those still go via the mdim
route).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pabloarosado pabloarosado changed the title 📊🤖 ETL-authored single charts via chart_configs.etlConfig 🎉 Charts as first-class citizens in ETL: layered approach with grapher Jun 1, 2026
pabloarosado and others added 4 commits June 2, 2026 15:52
`OWIDEnv.admin_site` already ends in `/admin`, so prefixing `/admin/charts/...`
produced URLs like `http://.../admin/admin/charts/9138/edit`. Drop the duplicated
segment; matches the convention used by `datasets_admin_site` etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `/grapher/<slug>.png` route only renders published charts. For drafts
it 404s — which is the common case while iterating on an ETL-authored
chart. owid-grapher already exposes `/grapher/by-uuid/<chart_configs.id>.png`
which serves any chart from R2's `byUUID/` directory, published or not.

Adds `get_chart_png_url.py`: takes a chart slug or numeric chart id,
queries the staging DB via `OWID_ENV.read_sql` to resolve it to the
`chart_configs.id` UUID, prints the by-uuid PNG URL. Accepts extra
grapher query params via `--key=value` flags (tab, time, country, …).

Updates SKILL.md to recommend this as the primary path; keeps the
agent-browser path as a fallback for MDIM data pages and interactive UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous content documented a `.chart.yml` system in `etl/steps/graph/`
that was an unmerged experiment and no longer exists in the repo. Rewrite
the skill to cover the actual Phase 1 workflow: single ETL-authored charts
as zero-dim mdim collections in `etl/steps/export/multidim/`, pushed via
`chart_configs.etlConfig`, previewed via the `by-uuid` PNG route.

Covers: file layout, minimum viable config, DAG entry, indicator catalog
paths + display.name overrides, common chart-config edits, push command,
preview via the check-chart-preview skill, and the admin co-editing model.

Adds a "when NOT to use" section pointing at create-multidim for charts
with dropdowns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The hens-by-housing-system chart was an iteration experiment, not a
canonical reference. The chick-culling chart is the committed Phase 1
example in this repo, so cite only that. Replace the eggs-and-hens
indicator paths in the display-name example with placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pabloarosado

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4954b3a57

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread etl/collection/chart_upsert.py
@pabloarosado pabloarosado marked this pull request as ready for review June 3, 2026 08:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds “single-chart” support to the ETL multidim export pipeline by treating a zero-dimension collection (dimensions: [] with exactly one view) as a regular Grapher chart, pushed via a layered ETL-config endpoint so admin edits can override ETL defaults. This also updates tooling (ETL step execution, version tracking, VSCode extensions, and Claude skills) to support YAML-only export steps and draft chart preview flows.

Changes:

  • Route zero-dimension multidim collections through a new chart upsert path (create/update chart + PUT chart etlConfig) instead of the MDIM endpoint.
  • Allow YAML-only export://multidim/... steps to execute without a .py file (and make tooling resolve .config.yml as the “script”).
  • Update VSCode extensions + Claude skills to preview draft charts correctly and to link DAG steps to .config.yml when needed.

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
vscode_extensions/clickable-dag-steps/src/extension.ts Adds .config.yml as a resolvable target for YAML-only export steps.
vscode_extensions/clickable-dag-steps/package.json Bumps extension version to 0.0.5.
vscode_extensions/clickable-dag-steps/CHANGELOG.md Documents YAML-only export step support (but version history has a gap).
vscode_extensions/chart-preview/src/extension.ts Adds chart-vs-mdim classification for export/multidim and uses admin preview routes for drafts.
vscode_extensions/chart-preview/package.json Bumps extension version to 0.0.2.
schemas/multidim-schema.json Updates schema guidance to explicitly allow dimensions: [] for single-chart mode.
etl/version_tracker.py Allows get_path_to_script to resolve .config.yml for export/multidim steps.
etl/steps/export/multidim/animal_welfare/latest/banning_of_chick_culling.config.yml Adds a concrete example of an ETL-authored single-chart collection.
etl/steps/__init__.py Enables YAML-only export/multidim execution via ExportStep fallback runner.
etl/collection/model/core.py Routes zero-dimension collections to chart upsert; rejects zero-dim collections with multiple views.
etl/collection/chart_upsert.py Implements chart creation/update + ETL config push for single-chart collections.
dag/animal_welfare.yml Wires the example export step into the DAG.
apps/step_update/cli.py Makes step_update copy YAML-only export steps (no .py) correctly.
apps/chart_sync/admin_api.py Adds AdminAPI methods to PUT/DELETE chart etlConfig.
.claude/skills/check-chart-preview/SKILL.md Updates preview workflow, emphasizing by-UUID PNG rendering for drafts.
.claude/skills/check-chart-preview/get_chart_png_url.py Adds helper to resolve slug/id → by-UUID PNG URL on staging.
.claude/skills/chart-editing/SKILL.md Updates “chart-editing” skill docs to target single-chart .config.yml in export/multidim.

Comment thread etl/collection/chart_upsert.py
Comment thread vscode_extensions/chart-preview/src/extension.ts
Comment thread vscode_extensions/clickable-dag-steps/CHANGELOG.md
Comment thread etl/steps/__init__.py
- chart_upsert: set topic_tags on newly created ETL-authored charts (admin-owned after creation)
- chart-preview ext: use fs.access for .py existence check instead of readFile
- clickable-dag-steps: backfill 0.0.4 changelog entry

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread schemas/multidim-schema.json
pabloarosado and others added 2 commits June 3, 2026 11:59
…quired

- Schema: drop `title`/`default_selection` from top-level required; drop `title_variant` from the title block's required.
- Model: `Collection.title` and `Collection.default_selection` default to None.
- Runtime: error when an mdim collection has no title; per-view warning when a non-map-only view has no entity selection fallback (neither collection-level `default_selection` nor per-view `selectedEntityNames`).
- Example YAML + chart-editing skill: drop top-level `title:` / `default_selection:` from single-chart configs (they were ignored in chart-mode anyway).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Marigold

Marigold commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Very nice! Worked a charm for me. A couple of ideas, but nothing prevents merging.

  1. We risk overriding hand authored charts with the same slug. If the chart doesn't have etlConfig, we should be cautious about overwriting it.
  2. Would it help to have some tests for e.g. _build_chart_config?
  3. If we start using this intensively, having a tool to migrate hand authored chart to ETL would be cool
  4. We should highlight ETL-authored charts in Admin

@lucasrodes lucasrodes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, @pabloarosado! This is a great addition. I quickly scanned the code, and left some minor suggestions. But overall, nothing critical! Excited to try out this tool in PROD.

I've tried the workflow with Claude and it mostly worked fine. Below are some comments on the different stages and a final one on removing charts

Creating the chart

Fetch this branch and (pointing to the corresponding staging server) ask Claude to create a new chart (on whatever topic).
This worked fine! I prompted Claude with the following instruction

i want to test the work from this PR. I've been suggested
  to create a chart to begin with, can you assist me w that?
  I'd like a scatterplot  that shows life expectancy vs
  population.

And it generated a new YAML file in etl/steps/export/multidim/demography/latest/life_expectancy_vs_population.config.yml. It looked good.

On the flip side, I felt it was a bit slow (took ~4 minutes). I am not sure if this is expected, or maybe there is something we can improve in the SKILL to make this faster. From my understanding, the skill presumably in use (chart-editing) is focussed on editing charts. Should we adapt it to also have the appropriate context for creating charts?

Claude's full transcript
● Skill(chart-editing)
  ⎿  Successfully loaded skill

● Let me look at the current state of the graph steps and PR
  work to understand the setup.

  Searched for 1 pattern (ctrl+o to expand)

● Bash(git -C /home/lucas/repos/etl log --oneline master..HEAD
      | head -20; echo ---; git -C /home/lucas/repos/etl diff
      master...HEAD --stat | tail -30)
  ⎿  8c6a5d037 Merge remote-tracking branch 'origin/maste
     r' into refactor-migrate-co2
     6b640f214 📊 Guinea Worm data update (#6178)
     … +45 lines (ctrl+o to expand)

● Bash(git -C /home/lucas/repos/etl branch --show-current;
      echo ---; git -C /home/lucas/repos/etl log --oneline -8;
      echo ---; rg --files /home/lucas/repos/etl -g "*.ch…)
  ⎿  refactor-migrate-co2
  work to understand the setup.

  Searched for 1 pattern (ctrl+o to expand)

● Bash(git -C /home/lucas/repos/etl log --oneline master..HEAD | head -20; echo ---; git -C /home/lucas/repos/etl diff
      master...HEAD --stat | tail -30)
  ⎿  8c6a5d037 Merge remote-tracking branch 'origin/master' into refactor-migrate-co2
     6b640f214 📊 Guinea Worm data update (#6178)
     d9e42c81b Add women's rights, human rights, crime, media, social connection and technology indicators to Int
     … +36 lines (ctrl+o to expand)

● Bash(git -C /home/lucas/repos/etl branch --show-current; echo ---; git -C /home/lucas/repos/etl log --oneline -8;
      echo ---; rg --files /home/lucas/repos/etl -g "*.ch…)
  ⎿  refactor-migrate-co2
     ---
     8c6a5d037 Merge remote-tracking branch 'origin/master' into refactor-migrate-co2
     … +45 lines (ctrl+o to expand)

● Bash(git -C /home/lucas/repos/etl branch -a | rg -i "chart" ; echo ---; git -C /home/lucas/repos/etl log --oneline
      -5 etl-chart-config 2>/dev/null; echo ---; git -C…)
  ⎿    bug-chart-diff-failure
       bug-chart-diff-query
       bug-remove-frame-chart
     … +57 lines (ctrl+o to expand)

● The branch exists but I need to see what's actually in it — the paths the skill mentions didn't match. Let me check
  what the PR changed:

● Bash(git -C /home/lucas/repos/etl diff master...etl-chart-config --stat | tail -40)
  ⎿   .claude/skills/chart-editing/SKILL.md              | 230 +++++++++++----------
      .claude/skills/check-chart-preview/SKILL.md        |  40 ++--
      .../check-chart-preview/get_chart_png_url.py       |  78 +++++++
     … +18 lines (ctrl+o to expand)

● Bash(git -C /home/lucas/repos/etl show
      etl-chart-config:etl/steps/export/charts/animal_welfare/latest/banning_of_chick_culling.config.yml; echo
      ===DAG===; git -C /ho…)
  ⎿  Error: Exit code 128
     fatal: path 'etl/steps/export/charts/animal_welfare/latest/banning_of_chick_culling.config.yml' does not exist in
     'etl-chart-config'
     (eval):1: ==DAG=== not found

● Bash(git -C /home/lucas/repos/etl show etl-chart-config:.claude/skills/chart-editing/SKILL.md)
  ⎿  ---
     name: chart-editing
     description: Create or edit an ETL-authored Grapher chart — a single-chart `.config.yml` in `etl/steps/expor
     … +161 lines (ctrl+o to expand)

● Bash(git -C /home/lucas/repos/etl checkout etl-chart-config && git -C /home/lucas/repos/etl status --short | head)
  ⎿  Switched to branch 'etl-chart-config'
     Your branch is up to date with 'origin/etl-chart-config'.
     ?? scripts/apply_softhyphen_prefixes.py
     … +4 lines (ctrl+o to expand)

  Listed 1 directory (ctrl+o to expand)

● Bash(rg "data://grapher/demography/2024-07-18/population|data://grapher/demography/2025-10-22/life_expectancy"
      /home/lucas/repos/etl/dag/*.yml -l | head -5; echo ---…)
  ⎿  /home/lucas/repos/etl/dag/demography.yml
     /home/lucas/repos/etl/dag/static_viz.yml
     ---LE-COLS---
     … +19 lines (ctrl+o to expand)

  Searched for 1 pattern (ctrl+o to expand)

● Bash(rg -A5 "grapher/demography/2024-07-18/population" /home/lucas/repos/etl/dag/demography.yml | head -10; echo
      ---; ls /home/lucas/repos/etl/etl/steps/data/grapher…)
  ⎿  Error: Exit code 2
       data://grapher/demography/2024-07-18/population_doubling_times:
         - data://garden/demography/2024-07-18/population_doubling_times

       ## Population comparison
       data://grapher/demography/2023-07-03/world_population_comparison:
         - data://garden/demography/2023-06-27/world_population_comparison:
     ---
     population_doubling_times.py
     __pycache__

● Branch switched — the skill list now reflects the PR version. Now let me pin down the population grapher step and
  the exact indicator paths:

  Listed 1 directory (ctrl+o to expand)

● Bash(.venv/bin/python -c "
      from owid.catalog import Dataset…)
  ⎿  ['historical', 'population', 'population_density', 'population_growth_rate', 'projections']
     ['population', 'source', 'world_pop_share']
     ---
     … +40 lines (ctrl+o to expand)

● Indicators identified. Let me check how existing configs handle scatterplots (especially the continents color
  dimension):

  Searched for 4 patterns (ctrl+o to expand)

● I have everything I need. Let me verify how config-only export steps work (whether a .py is needed) and check the
  chick-culling DAG entry:

● Bash(ls /home/lucas/repos/etl/etl/steps/export/multidim/animal_welfare/latest/; echo ---DAG---; rg -n -B3 -A4
      "multidim/animal_welfare" /home/lucas/repos/etl/dag/ani…)
  ⎿  banning_of_chick_culling.config.yml
     ---DAG---
     112-  #
     … +55 lines (ctrl+o to expand)

● YAML-only config steps confirmed. Now let me look at where to add the DAG entry and check the multidim schema for
  the scatter indicator shape:

● Bash(rg -n "life_expectancy" /home/lucas/repos/etl/dag/demography.yml | head; echo ---; tail -20
      /home/lucas/repos/etl/dag/demography.yml; echo ---SCHEMA---; .venv/b…)
  ⎿  184:  data://grapher/demography/2025-10-22/life_expectancy:
     185:    - data://garden/demography/2025-10-22/life_expectancy:
     208:  data://grapher/oecd/2023-10-11/life_expectancy_birth:
     … +89 lines (ctrl+o to expand)

  Read 1 file (ctrl+o to expand)

● Now I'll create the chart config and DAG entry:

● Write(etl/steps/export/multidim/demography/latest/life_expectancy_vs_population.config.yml)
  ⎿  Wrote 31 lines to etl/steps/export/multidim/demography/latest/life_expectancy_vs_population.config.yml
       1 topic_tags:
       2   - "Life Expectancy"
       3 dimensions: []
       4 views:
       5   - dimensions: {}
       6     indicators:
       7       y:
       8         - catalogPath: "life_expectancy_at_birth#life_expectancy_0"
       9           display:
      10             name: "Life expectancy at birth"
     … +21 lines (ctrl+o to expand)

● Update(dag/demography.yml)
  ⎿  Added 8 lines
      188          - data://garden/demography/2025-10-22/life_tables
      189          - data://garden/un/2024-07-12/un_wpp
      190
      191 +  #
      192 +  # Life expectancy vs. population — chart authored in ETL.
      193 +  #
      194 +  export://multidim/demography/latest/life_expectancy_vs_population:
      195 +    - data://grapher/demography/2025-10-22/life_expectancy
      196 +    - data://grapher/demography/2024-07-15/population
      197 +    - data://grapher/regions/2023-01-01/regions
      198 +
      199    # Broken limits of Life Expectancy
      200    data://grapher/demography/2025-10-22/broken_limits_le:
      201      - data://garden/demography/2025-10-22/broken_limits_le:

● Both files are in place. Let me validate the DAG wiring with a quick dry-run (no execution):

● Bash(.venv/bin/etlr export://multidim/demography/latest/life_expectancy_vs_population --export --dry-run 2>&1 | tail
      -15)
  ⎿  --- Detecting which steps need rebuilding...
     OK (1.1s)
     --- Would run 9 steps (at least 3m 51.1s):
     … +9 lines (ctrl+o to expand)

● Everything resolves. The chart is ready to test. Here's what I set up:

  What was created

  1. Chart config — etl/steps/export/multidim/demography/latest/life_expectancy_vs_population.config.yml

  A zero-dimension mdim (the PR's single-chart format) with one scatter view:
  - y: life_expectancy_at_birth#life_expectancy_0 (OMM life expectancy, demography/2025-10-22/life_expectancy)
  - x: population#population (demography/2024-07-15/population), log scale since population spans ~5 orders of
  magnitude
  - color: regions#owid_region — continent coloring, same pattern the COVID scatters use

  2. DAG entry — dag/demography.yml:191 with the three grapher deps.

  The dry-run validates fine — public slug will be life-expectancy-vs-population.

  Next step (for you to run)

  Some upstream deps aren't built locally (it wants to rebuild the hmd → life_tables chain, ~4 min), so fetch them
  from the catalog instead:

  PREFER_DOWNLOAD=1 .venv/bin/etlr export://multidim/demography/latest/life_expectancy_vs_population --export
  --private

  On success it prints admin_url=http://staging-site-etl-chart-config/admin/charts/<id>/edit.

  Then for iterating on the YAML afterwards, the fast loop is:

  .venv/bin/etlr export://multidim/demography/latest/life_expectancy_vs_population --export --force --only

  Once it's pushed, I can fetch the PNG preview (via get_chart_png_url.py from the PR) so we can check the scatter
  visually — let me know how the run goes.

✻ Cooked for 4m 14s

Previewing the chart

My understanding from Claude's output is that I need to execute the export step. So the preview is only possible after the chart exists in Admin. Just to confirm, did I understand this correctly?

After running the etlr command, I went ahead and typed /check-chart-preview.

I like the previewing chart a lot, but I think it'd be helpful at some point to get the URL to the chart. Maybe when etlr runs the export:// step and upserts a chart to the DB, so that it is easier to go to the live chart.

E.g. some message after the --- Finished export://multidim/demography/latest/life_expectancy_vs_population (2.8s), like chart available at staging-.../admin/charts/<chart_id>/edit.

Updating & syncing with Admin

Run the export step of the chart again, and check that the title is still the one manually set from the admin.

works perfectly! Also, if then i go and edit the title in ETL to match Admin's, then Admin recognizes that it is synced with ETL.

From the chart admin, link again the title field to etl, and check that the chart uses the etl values again.

Works!

Removing charts

One thing that came to mind is what should be the behavior when someone removes a chart in Admin. If that happens, and we don't do the same in ETL (e.g. archiving the chart step), the chart will appear again in PROD. Do you have an intuition on how we can solve this to avoid human errors? Somehow we should make ETL aware of the list of catalog paths of the "removed charts".

Maybe we need to add a new table in Grapher (e.g. charts_removed) that lists the IDs and chart paths of the removed charts (along with other fields like removedAt, removedBy, etc.). This is a bit related to the idea of having catalogPath in the charts table, which I shared in owid/owid-grapher#6553 (comment).

Comment thread etl/collection/model/core.py Outdated
Comment thread etl/collection/chart_upsert.py Outdated
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pabloarosado

Copy link
Copy Markdown
Contributor Author

Very nice! Worked a charm for me. A couple of ideas, but nothing prevents merging.

  1. We risk overriding hand authored charts with the same slug. If the chart doesn't have etlConfig, we should be cautious about overwriting it.
  2. Would it help to have some tests for e.g. _build_chart_config?
  3. If we start using this intensively, having a tool to migrate hand authored chart to ETL would be cool
  4. We should highlight ETL-authored charts in Admin

Hi Mojmir, thanks for the review! I agree with all your points:

  1. I can't think of any dangerous cases. Chart diff would catch differences between full chart configs.
  2. Done.
  3. This is already in TODO for Phase 1.
  4. Added to TODO for Phase 1.

pabloarosado and others added 2 commits June 4, 2026 14:00
validate_title() raised for Explorers (Collections with dimensions) even
though explorers manage their own title, breaking test_explorer_migration.
Gate both validations on _collection_type == "multidim".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pabloarosado

Copy link
Copy Markdown
Contributor Author

Hi @lucasrodes, thanks for trying it out and for the feedback! Let me address your comments:

  • From looking at the transcript, I think there are multiple things taking Claude time (e.g. changing branches, getting more context, etc.). We could figure out some way to improve the skill to avoid some of those steps and reduce the time. Let's do that in a separate PR.
  • On previewing: The chart preview extension already executes the export step on opening (even before the chart exists on grapher). And chart preview already has a link to the corresponding chart preview at the top.
  • The /check-chart-preview skill is mostly for Claude to be able to inspect the outputs. The user should be inspecting the chart preview directly (or go to the admin).
  • On removing charts: good point. Your charts_removed idea make sense. More broadly we currently don't have a good way to handle the removal of charts on staging (e.g. when you delete a chart on staging, chart-sync doesn't delete the chart in prod; you just get a warning message). We should handle the issue of how we handle chart removals in a separate issue/PR.

I'll have a look at the rest of your comments, thanks for the thorough review!

pabloarosado and others added 2 commits June 4, 2026 15:35
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The admin etlConfig endpoint only checks the schema version, not structure, so a
typo'd field or wrong type would be stored and silently render wrong. Validate the
built config against the locally vendored grapher schema before pushing (no network),
surfacing a clear error pointing at the offending field. Skips if the config's schema
version isn't vendored locally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pabloarosado and others added 10 commits June 5, 2026 15:34
* 🎉🤖 Key ETL charts by catalogPath (parallel to mdims)

Persist the collection's catalog_path on charts.catalogPath (via the etlConfig
endpoint) and look up the chart by it instead of by slug. This gives single
charts the same stable ETL identity mdims already have
(multi_dim_data_pages.catalogPath), with the slug left as the mutable public URL.

Behavior change: ETL no longer adopts a hand-authored chart that merely shares a
slug — it keys on catalogPath, so an accidental slug collision surfaces at publish
time instead of silently overwriting.

Pairs with owid-grapher PR (adds the charts.catalogPath column). Targets the
chart-config feature branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 📜🤖 Fix put_chart_etl_config docstring for row-model storage

The ETL config is stored in its own chart_configs row (charts.configIdETL),
not in a chart_configs.etlConfig column.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ETL-authored charts are created independently on staging and prod, so they
have different ids but a shared catalogPath. Match them across environments
by catalogPath (falling back to id+createdAt for hand-authored charts), sync
the etlConfig layer alongside the main config, and bootstrap a created chart
with its etlConfig + catalogPath so ETL can adopt it. Approvals/conflicts for
"catalog-path twins" (prod minted its own row) key on a NULL target timestamp.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ins)

set_status recorded a catalog-path twin's approval with the prod chart's
updatedAt, but _get_approvals looks twins up with a NULL target timestamp —
so re-approving an existing ETL chart would lose the approval. Record via
_target_updated_at_for_review so recording and lookup agree.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… columns

Production's charts table doesn't have catalogPath/configIdETL yet (owid-grapher
#6553 isn't merged to master), so chart-diff — which loads prod charts — failed
with "Unknown column", breaking the staging build. Added a
PROD_HAS_ETL_CHART_COLUMNS flag (default False): chart-diff falls back to
id+createdAt matching and defers those columns out of the prod query. No
practical loss yet (prod has no ETL/catalogPath charts). TODO: remove the flag +
defer() once the grapher schema is in prod.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the PROD_HAS_ETL_CHART_COLUMNS flag with a runtime check of the target
(production) DB's information_schema: chart-diff uses catalogPath matching only
when charts.catalogPath/configIdETL actually exist there, otherwise it defers
those columns out of the query and falls back to id+createdAt matching. The ETL
PR merges after owid-grapher #6553, so prod will already have the columns by
then and chart-diff switches over on its own — no flag to flip or code to touch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cache_all built a canonical, hashable form of the arguments and then called the
wrapped function with that canonical form instead of the originals. A DataFrame
argument therefore arrived as a ("__obj__", "DataFrame", ...) tuple, so e.g.
producer_analytics' _process_df crashed with "'tuple' object has no attribute
'groupby'" on the non-Streamlit (CLI/test) path. Now the canonical form is used
only as the cache key; the function is always called with the original args.

Pre-existing bug (from #5056), unrelated to the chart-config work — it just
happens to surface in the staging integration tests on this branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… run

test_chart_diff.py imported chart_diff at module top. pytest imports every test
module during collection (even ones deselected by -m integration), and importing
chart_diff applies its module-level @st_cache_data decorators, which call the
@cache'd is_running_in_streamlit(). At collection time there's no Streamlit
runtime, so it froze to False process-wide and routed producer_analytics'
cached _process_df onto the buggy non-Streamlit path, crashing an unrelated
integration test. Moving the import into the test bodies keeps collection clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ETL-authored charts now set isInheritanceEnabled=True explicitly when the chart
row is first created, so grapher_config defined on an indicator flows into every
chart built on it. Set only at creation, so existing charts (ETL or admin) are
never touched. No owid-grapher change needed; the admin create endpoint already
honors the inheritance flag.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Charts that pre-date ETL authorship have no catalogPath, so the
catalogPath-only lookup failed against any DB where the step had never
run (rebuilt staging, prod at merge time) and crashed on the slug
collision with the existing chart. Fall back to the slug and adopt the
chart: put_chart_etl_config stamps the catalog path, so later runs find
it by catalogPath directly. Refuse to adopt a chart already owned by a
different ETL step.

Verified on staging: first run adopts chart 7118 with a byte-identical
rendered config (admin map override preserved, no version bump); second
run takes the regular catalogPath update path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants