Context
Follow-up from PR #382 review — see this comment by @zhenchaoni.
Several HuggingFace model configs do not expose head_dim natively (e.g. BartConfig, MarianConfig, and per the comment also blip and trocr). To make these models work with PastKeyValueInputGenerator (which now reads normalized_config.head_dim unconditionally), PR #382 introduced per-model NormalizedConfig subclasses with a computed head_dim:
src/winml/modelkit/models/hf/bart.py — _BartDecoderNormalizedConfig.head_dim
src/winml/modelkit/models/hf/marian.py — _MarianDecoderNormalizedConfig.head_dim
Both implementations are identical:
@property
def head_dim(self) -> int:
return self.hidden_size // self.num_attention_heads
Proposal
Lift the derived head_dim property into a shared NormalizedConfig base class (e.g. _DerivedHeadDimNormalizedConfig in a common module under models/winml/ or models/hf/) so that BART, Marian, and the upcoming BLIP / TrOCR configs can simply inherit it, instead of redefining the same property in each subclass.
Acceptance criteria
Context
Follow-up from PR #382 review — see this comment by @zhenchaoni.
Several HuggingFace model configs do not expose
head_dimnatively (e.g.BartConfig,MarianConfig, and per the comment alsoblipandtrocr). To make these models work withPastKeyValueInputGenerator(which now readsnormalized_config.head_dimunconditionally), PR #382 introduced per-modelNormalizedConfigsubclasses with a computedhead_dim:src/winml/modelkit/models/hf/bart.py—_BartDecoderNormalizedConfig.head_dimsrc/winml/modelkit/models/hf/marian.py—_MarianDecoderNormalizedConfig.head_dimBoth implementations are identical:
Proposal
Lift the derived
head_dimproperty into a sharedNormalizedConfigbase class (e.g._DerivedHeadDimNormalizedConfigin a common module undermodels/winml/ormodels/hf/) so that BART, Marian, and the upcoming BLIP / TrOCR configs can simply inherit it, instead of redefining the same property in each subclass.Acceptance criteria
head_dimproperty in a common location_BartDecoderNormalizedConfigand_MarianDecoderNormalizedConfiginherit from it and drop their localhead_dimoverrides