[Feature request] Nuke the Assistant Axis

The "Assistant Axis" refers to a concept in language models that represents the default helpful persona these models adopt. It helps stabilize their behavior by preventing them from drifting into less desirable character archetypes during interactions.

The research behind this stems from Anthropic. https://arxiv.org/abs/2601.10387

Basically, the alignment of the model to this spectrum represents the generic "Helpful assistant" persona. This hinders role-playing and leads to refusals.

I don't know whether this in the scope of this project, but eliminating the impact of this "Assistant" in models could be great for role-playing in general and it will definitely save resources for community fine-tuners when they are fine-tuning models on RP data.

---

Edit: Seems like this axis is *forced*, and not in-built. However, if it will be in-built somehow, we will need to look into this again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Nuke the Assistant Axis #163

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature request] Nuke the Assistant Axis #163

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions