Feature Request: Add support for Bytedance Seed-OSS models

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Support the new Bytedance OSS model: https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct.





### Motivation

This model has good benchmark results for its size class (~30b models) and it doesn't look too different to regular llama arch (see below).

This model may also have future derivative fine-tuned models because it has released alongside a 36B base model which appears to be very high quality.

### Possible Implementation

I have looked at the Bytedance transformers code which is awaiting merge at https://github.com/Fazziekey/transformers/tree/seed-oss and this architecture looks to not be too different to what is already implemented in llamacpp currently:-

1. Only the MLP and Attention appear different to vanilla LLama arch.
2. MLP and Attention have new residual_dropout parameters, but it is my understanding these are not used during inference anyway (implemented as nn.functional.dropout with training=False during inference).
3. Attention mechanism has separate bias enabled/disabled for the QKV heads and the output layer. In their published model there are bias terms for the QKV but none for the output layer. I think this is the only implementation difference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add support for Bytedance Seed-OSS models #15483

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add support for Bytedance Seed-OSS models #15483

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions