Skip to content

Define and document base model attributes before further case development #6

@ghxm

Description

@ghxm

Problem

The base model hierarchy (Procedure, Event, Document) has a minimal attribute set that works for the EU case, but have not been systematically validated wrt. whether these attributes are genuinely universal or accidentally EU-shaped. Before proceeding with template refinement, new cases (US/UK), or researcher-facing documentation, base attributes and their definition and documentation need to be properly defined.

Plan

1. Define the canonical attribute set per entity

  • What fields belong on the base models vs case-specific subclasses?
  • Which interface properties (start_event, adoption_event, end_event, status) are truly universal? → also naming (e.g., introduction or proposal date vs. start date)
  • What additional lifecycle concepts are needed (rejection, expiry, lapse, veto)?

2. Operationalize each attribute

  • Precise definitions: what does "start_date" mean across legislative systems? Is it introduction, proposal, filing?
  • Type decisions: dates, enums, free text, identifiers
  • Cardinality: single value or list? Optional or required?
  • How computed properties (start_date, duration, status) relate to stored fields

3. Document the decisions

  • Researcher-facing data dictionary with plain-language definitions
  • Justification for each base attribute (why universal, not case-specific?)
  • Mapping table: base concept -> EU operationalization -> (future) US/UK operationalization
  • Known limitations and edge cases

4. Validate against multiple legislative systems

  • Confirm the base model works for EU + at least one other system (US or UK sketch)
  • Identify attributes that seem universal but have different semantics across systems

Impact

  • Template refinement (openbasement): without knowing what the target model needs, we can't prioritize which CDM predicates to add
  • New case implementation: a US/UK case would expose design flaws in the base model too late
  • Dataset documentation: we can't write a proper codebook without settled definitions
  • Codebook generation: the extract_codebook() pipeline produces field tables, but the content of those tables depends on getting the definitions right first

Current state

The base hierarchy is: Entity > Document, Event, Procedure. EU subclasses add typed domain fields. Interface properties (start_event, adoption_event, etc.) are overridable. This works but hasn't been stress-tested against a second legislative system.

Key open question from the audit: EUProcedure.date (from cdm:date) is vaguely defined as "typically the latest significant event." Its relationship to start_date (the Commission proposal date) needs clarification or the field needs to be renamed/removed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions