feat: Add support for acausal crosscoders.#475
Draft
mkbehr wants to merge 61 commits intodecoderesearch:mainfrom
Draft
feat: Add support for acausal crosscoders.#475mkbehr wants to merge 61 commits intodecoderesearch:mainfrom
mkbehr wants to merge 61 commits intodecoderesearch:mainfrom
Conversation
Add unit tests for implementing the CrossCoder SAE's ability to collect activations from the same hook type at multiple layers. These tests verify: - Initialization with multiple layers - Activation collection from all layers - Proper buffer and batch handling - Layer-specific normalization - Backward compatibility 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
included tests: - test_crosscoder_sae_init - test_crosscoder_sae_fold_w_dec_norm hook_z excluded from tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add support for acausal crosscoders, as described in https://transformer-circuits.pub/2024/crosscoders/index.html.
Add a new CrosscoderSAE class and related training classes. Crosscoder SAEs take activations from multiple hook points and predict activations from multiple hook points, for an input and output shape of
(..., n_layers, d_in).In this PR, the input and predicted activations can only take from the same sets of layers. This means that the crosscoders have to be acausal, and don't support
use_error_term=Falseor any kind of reconstruction evals. In future changes, we can support causal crosscoders by letting the crosscoder take inputs from one set of hooks and predict another set of hooks. That would also support transcoders as a special case.The new config parameter
hook_namescontrols whether to use a standard SAE or a crosscoder, and determines which hook points are used.hook_namesalso causesActivationsStoreto return multiple layers, along the existing layer dimension.The current inheritance setup in this draft isn't very clean, and doesn't pass type checking. It might be easier to represent this after #425 or #468 are merged.
Type of change
Checklist:
You have tested formatting, typing and tests
make check-cito check format and linting. (you can runmake formatto format code if needed.)