Skip to content

Conversation

@kohankhaki
Copy link
Collaborator

@kohankhaki kohankhaki commented Nov 25, 2025

PR Type

Feature

Short Description

This pull request introduces a comprehensive, standardized schema system for all stages of the ACE pipeline. It adds Python dataclasses for each pipeline stage, implements robust save/load I/O utilities, and centralizes all schema imports and exports for easy use. The changes ensure consistent data formats across the pipeline, making it easier to maintain, extend, and integrate different components.
Current pipeline does not use this exact schema. The goal is to standardize future implementations.
src/schemas/README.md and src/schemas/PIPELINE_SCHEMAS.md explain details of the pipeline schema.

Tests Added

None


This change is Reviewable

@kohankhaki kohankhaki requested a review from afkanpour November 25, 2025 19:10
Copy link
Collaborator

@afkanpour afkanpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afkanpour reviewed 11 of 11 files at r1, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @kohankhaki)


src/schemas/io_utils.py line 16 at r1 (raw file):

def save_experiment_output(

Why not naming these functions save_experiment(), save_domain(), load_experiment(), etc.?

Code quote:

save_experiment_output

src/schemas/area_schemas.py line 12 at r1 (raw file):

    name: str
    area_id: str

Curious, what are attributes like area_id and domain_id used for?

Code quote:

area_id

src/schemas/area_schemas.py line 14 at r1 (raw file):

    area_id: str
    description: Optional[str] = None
    domain: str = ""

Use a domain object instead.

Code quote:

domain

src/schemas/area_schemas.py line 15 at r1 (raw file):

    description: Optional[str] = None
    domain: str = ""
    domain_id: str = ""

I guess this can be removed if we use a domain object.

Code quote:

domain_id

src/schemas/metadata_schemas.py line 1 at r1 (raw file):

"""Metadata schemas for pipeline stages."""

Please expand the docstring explaining the intention for introducing metadata, how/where they are used.

In general, comments should be expressive so the reader has a good understanding of what the code is supposed to do before diving into the actual implementation.


src/schemas/capability_schemas.py line 14 at r1 (raw file):

    capability_id: str
    description: Optional[str] = None
    area: str = ""

If we have objects for area, domain, etc., let's use them instead of strings. Please apply this change everywhere.

Code quote:

area

src/schemas/experiment_schemas.py line 12 at r1 (raw file):

    experiment_id: str
    domain: str

Why not pass a domain object? That way, we can remove domain_id here too.

Code quote:

domain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants