Skip to content

MDIM/Explorer config harmonization #4032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lucasrodes opened this issue Feb 25, 2025 · 10 comments
Open

MDIM/Explorer config harmonization #4032

lucasrodes opened this issue Feb 25, 2025 · 10 comments

Comments

@lucasrodes
Copy link
Member

lucasrodes commented Feb 25, 2025

Currently, we have tooling for MDIMs and ETL-export Explorers in etl.multidim and etl.explorer. However, these two modules could share some logic. The current structure is not very suitable for this.

Generally, this tooling is used for handling config files, which ideally should be very close to one another.

Proposal

  • New module: etl.collections. In it, we have:
    • etl.collections.base (preliminar name): which contains common logic.
    • etl.collections.utils (preliminar name): which contains smaller util functions.
    • etl.collections.multidim: MDIM-specific logic
    • etl.collections.explorer: Explorer-specific logic.
    • etc.

Others:

  • We need classes for the config of MDIMs and Explorers (wrappers around the plain YML config).
  • We need to figure out a way to load the enriched config from a step. Currently, we rely on paths.load_mdim_config, which returns a plain dictionary. This function is in etl.helpers, which can't really import from etl.collections.multidim, since this latter imports the former already.

Related work

@lucasrodes lucasrodes changed the title MDIM/Explorer tooling MDIM/Explorer tooling harmonization Feb 25, 2025
@lucasrodes lucasrodes self-assigned this Feb 25, 2025
@lucasrodes
Copy link
Member Author

Partly addressing this in #4030

@Marigold
Copy link
Collaborator

Let us know when you plan to move files or a big restructure, please. There are some open PRs for mdims and rebasing them could get nasty.

@lucasrodes
Copy link
Member Author

lucasrodes commented Feb 26, 2025

I've merged my changes from #4030, which have put in place the space of etl.collections, and moved some of the logic we had in etl.multidim in there.

I'm resolving conflicts now:

@pabloarosado
Copy link
Contributor

Hey @lucasrodes thanks a lot for starting this. For this specific issue, is there anything left to do, or do you want to consider it a tracking issue?

@lucasrodes
Copy link
Member Author

Hey Pablo! So this issue is rather high-level, so probably belongs to the "tracking realm".

I'll keep an updated list of related PRs/issues in the description.

@pabloarosado
Copy link
Contributor

Thanks @lucasrodes, I see some overlap with:

Feel free to integrate this into one of those already existing issues, to avoid too much dispersion.

@lucasrodes
Copy link
Member Author

lucasrodes commented Feb 27, 2025

@pabloarosado The description of #3969 already points readers to #3992, so I'll go ahead and close it.

On #3992, I think it is more general than this one, since it also considers things like: update workflow, wizard app, csv-to-etl migration of explorers, etc. This issue instead is focussed on harmonizing the how we handle configs of ETL-based explorers and mdims in ETL, so they almost feel the same.

Note that this issue is actually mentioned in the description of #3992 (point 2).

I've edited a bit the description of this issue to be a bit more explicit.

@lucasrodes lucasrodes changed the title MDIM/Explorer tooling harmonization MDIM/Explorer config harmonization Feb 27, 2025
@lucasrodes
Copy link
Member Author

lucasrodes commented Mar 4, 2025

I've finished the first iteration of harmonizing the tooling for Explorers and MDIMs in #4035. COVID explorer and mdims now use this new logic.

Find below a more detailed description of the improvements from this work.

Model summary

I've abstracted config logic from Explorers and MDIMs, and put a model in place (very similar to what we have in owid.catalog.meta).

Diagram in etl.collections

flowchart LR
    %% Define nodes
    A[explorer]
    B[multidim]
    C[common]
    D[model]
    E[utils]

    %% Define edges
    A --> D
    A --> C
    A --> E

    B --> D
    B --> C
    B --> E

    C --> D

    D --> E

Loading

Summary of the module structure:

  • model
    Encapsulates the abstraction of the data model used by both explorer and multidim.
  • explorer & multidim
    Provide specialized tooling (e.g., logic, user-facing features) specific to Explorer and Multidimensional capabilities.
  • common
    Contains shared functionality and helper logic extracted from explorer and multidim so they can both utilize it.
  • utils
    Includes general-purpose utilities used by any module. Should remain independent (i.e., does not import from other modules) to prevent circular dependencies.

Examples

MDIM

upsert_multidim_data_page encapsulates logic on validation, processing and upserting to DB.

for fname in filenames:
paths.log.info(fname)
config = paths.load_mdim_config(fname)
multidim.upsert_multidim_data_page(
config=config,
paths=paths,
mdim_name=fname_to_mdim_name(fname),
)

Explorer

create_dataset encapsulates logic on validation, processing and uploading to owid-content. In the future, we should probably have a upsert_explorer method.

def run(dest_dir: str) -> None:
#
# Load inputs.
#
# Load grapher config from YAML
config = paths.load_explorer_config()
# Create explorer
ds_explorer = create_explorer(
dest_dir=dest_dir,
config=config,
paths=paths,
tolerate_extra_indicators=True,
)

Future work

  • We need a schema for Explorers, to validate the YAML config, like we do with MDIMs.
    • The config of explorers is still specific to explorers (minor differences now); I think we should try to modify it and bring it closer to MDIMs.
    • At the moment the source of truth for the "schema" of an explorer view seems to live in a TypeScript file. It should have its own JSON file!
  • We need a place in the DB for explorers (connected to above's point of defining a schema).
  • Test this changes by migrating some explorers to use this model.
  • Add some docs as it becomes clearer.
  • Wizard templating.
  • Test other kinds of explorers: code/yaml combinations.

@pabloarosado
Copy link
Contributor

Thanks @lucasrodes, this is looking really good! One idea to make the workflow of explorers/mdims a bit more straightforward (more similar to any other ETL data step) would be to adapt helpers.PathFinder to handle the creation of the explorer/mdim. So, instead of the user having to import things from multidim or explorer (e.g. the upsert_... function), PathFinder could already know which one to use depending on the type of step. For example, we could either have a paths.create_explorer and a paths.create_multidim method, or just a common paths.create_collection. I'm not strongly opinionated about this, if you prefer to have very different types of codes for data steps, explorers and mdims, that's ok too.
We can talk a bit more about this on Thursday, during shaping. Thanks for doing this work!

@lucasrodes
Copy link
Member Author

lucasrodes commented Mar 12, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants