Skip to content

Lift derived head_dim into a shared NormalizedConfig base class #401

Description

@vortex-captain

Context

Follow-up from PR #382 review — see this comment by @zhenchaoni.

Several HuggingFace model configs do not expose head_dim natively (e.g. BartConfig, MarianConfig, and per the comment also blip and trocr). To make these models work with PastKeyValueInputGenerator (which now reads normalized_config.head_dim unconditionally), PR #382 introduced per-model NormalizedConfig subclasses with a computed head_dim:

  • src/winml/modelkit/models/hf/bart.py_BartDecoderNormalizedConfig.head_dim
  • src/winml/modelkit/models/hf/marian.py_MarianDecoderNormalizedConfig.head_dim

Both implementations are identical:

@property
def head_dim(self) -> int:
    return self.hidden_size // self.num_attention_heads

Proposal

Lift the derived head_dim property into a shared NormalizedConfig base class (e.g. _DerivedHeadDimNormalizedConfig in a common module under models/winml/ or models/hf/) so that BART, Marian, and the upcoming BLIP / TrOCR configs can simply inherit it, instead of redefining the same property in each subclass.

Acceptance criteria

  • New shared base class with the derived head_dim property in a common location
  • _BartDecoderNormalizedConfig and _MarianDecoderNormalizedConfig inherit from it and drop their local head_dim overrides
  • BLIP and TrOCR (when added) reuse the same base class
  • Existing BART / Marian export tests still pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — minor bug or non-critical improvementrefactorCode refactoringtriagedIssue has been triaged

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions