Skip to content

Conversation

@hmellor
Copy link
Member

@hmellor hmellor commented Dec 1, 2025

Since we have the helper function, it'd be better to use it so that the behaviour is consistent.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to refactor attribute access to use a helper function for consistency. While the change in get_inputs_embeds_size is correct, the change in get_total_num_kv_heads introduces a critical bug. The refactoring alters the logic for finding the number of KV heads. The original code correctly skipped attributes that were present but None, while the new implementation with getattr_iter will return None if it's the value of the first attribute found. This can lead to a TypeError downstream. I've left a specific comment with a suggested fix.

Comment on lines 1314 to +1317
# For non-grouped-query attention models, the number of KV heads is
# equal to the number of attention heads.
return self.hf_text_config.num_attention_heads
default = self.hf_text_config.num_attention_heads
return getattr_iter(self.hf_text_config, attributes, default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This refactoring introduces a critical bug. The original implementation iterated through attributes and returned the value of the first attribute that was not None. The new implementation using getattr_iter will return the value of the first attribute that exists, even if that value is None.

For example, if self.hf_text_config has an attribute n_head_kv=None and num_kv_heads=8, the old code would correctly return 8. The new code would return None, which would cause a TypeError when this function's return value is used in comparisons.

Please revert to the original logic, which was correct.

        for attr in attributes:
            num_kv_heads = getattr(self.hf_text_config, attr, None)
            if num_kv_heads is not None:
                return num_kv_heads

        # For non-grouped-query attention models, the number of KV heads is
        # equal to the number of attention heads.
        return self.hf_text_config.num_attention_heads

# For non-grouped-query attention models, the number of KV heads is
# equal to the number of attention heads.
return self.hf_text_config.num_attention_heads
default = self.hf_text_config.num_attention_heads
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does self.hf_text_config.num_attention_heads always exist? If not, existing code might not fail but current code certainly will. Could we pass a lambda default instead?

Suggested change
default = self.hf_text_config.num_attention_heads
default = lambda: self.hf_text_config.num_attention_heads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants