Ablation Technique - pre-bias

 wanted to bounce something off you. Have you tried router pre-bias as an ablation mechanism on MoE models?

  The idea: instead of (or in addition to) the usual residual-direction orthogonalization or expert zero-out, we bias the gate logits before the top-k routing decision. For Qwen3.5-MoE / Qwen3.6-MoE (256 experts,
  top-8 per token), the gate produces logits that softmax+topk into the chosen experts. We add a per-expert bias b = -α × log_ratio, where log_ratio[L,e] = log(p_refused_routes_to_e / p_complied_routes_to_e) from
  an activation pass.

  Effect: experts that the model preferentially routes to on refused prompts get pushed DOWN, experts that fire on complied prompts get pushed UP. The router still picks the top-8, but the population of "top-8"
  gets shifted away from the refusal-correlated subset. We're not deleting experts, not editing weights — just re-allocating routing.

  Empirical results (Qwen3.5-35B-A3B and Qwen3.6-35B-A3B, abliterating safety refusals):

  - Sharp non-monotonic optimum at α=0.5: +0.093 composite-score gain over baseline (single biggest single-step jump we'd seen since rank-3 subspace ortho)
  - α=1.0 actually goes WORSE than baseline — over-rerouting collapses something
  - Stacks cleanly with residual-orthogonalization (router-bias affects WHICH experts run; ortho affects WHAT they compute — orthogonal axes)

  On Qwen3.6-35B-A3B with NSGA-II combination search (40 trials), our cleanest ablation came from a pure router-bias plan: 4 layers (L5_α=2.0, L8_α=0.5, L14_α=0.5, L20_α=1.0), nothing else. Result: flip rate
  0.9375, MMLU/GSM8K/PPL damage 0.000, composite score 0.9375. No collateral, no util drop — refusal just reroutes around itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ablation Technique - pre-bias #322

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Ablation Technique - pre-bias #322

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions