-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Support the new Bytedance OSS model: https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct.
Motivation
This model has good benchmark results for its size class (~30b models) and it doesn't look too different to regular llama arch (see below).
This model may also have future derivative fine-tuned models because it has released alongside a 36B base model which appears to be very high quality.
Possible Implementation
I have looked at the Bytedance transformers code which is awaiting merge at https://github.com/Fazziekey/transformers/tree/seed-oss and this architecture looks to not be too different to what is already implemented in llamacpp currently:-
- Only the MLP and Attention appear different to vanilla LLama arch.
- MLP and Attention have new residual_dropout parameters, but it is my understanding these are not used during inference anyway (implemented as nn.functional.dropout with training=False during inference).
- Attention mechanism has separate bias enabled/disabled for the QKV heads and the output layer. In their published model there are bias terms for the QKV but none for the output layer. I think this is the only implementation difference?
mahmoodsh36, Downtown-Case and Mrw33554432
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request