Skip to content

Feature Request: Add support for Bytedance Seed-OSS models #15483

@matt23654

Description

@matt23654

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Support the new Bytedance OSS model: https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct.

Motivation

This model has good benchmark results for its size class (~30b models) and it doesn't look too different to regular llama arch (see below).

This model may also have future derivative fine-tuned models because it has released alongside a 36B base model which appears to be very high quality.

Possible Implementation

I have looked at the Bytedance transformers code which is awaiting merge at https://github.com/Fazziekey/transformers/tree/seed-oss and this architecture looks to not be too different to what is already implemented in llamacpp currently:-

  1. Only the MLP and Attention appear different to vanilla LLama arch.
  2. MLP and Attention have new residual_dropout parameters, but it is my understanding these are not used during inference anyway (implemented as nn.functional.dropout with training=False during inference).
  3. Attention mechanism has separate bias enabled/disabled for the QKV heads and the output layer. In their published model there are bias terms for the QKV but none for the output layer. I think this is the only implementation difference?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions