Issue 39 support othellogpt in SAELens #317

decandido · 2024-10-03T12:45:22Z

Description

Adding support for OthelloGPT. Adding logic to train on sequences with a start and end position offset similar to PR 294. Adding tests for the added logic and to benchmark the OthelloGPT SAETrainingRunner.

Fixes # Issue 39

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

I have run make check-ci to check format and linting. (you can run make format to format code if needed.)

jbloomAus · 2024-10-03T17:55:32Z

Thanks @decandido and @zhenningdavidliu!

This has some overlap with #294. Specifically, I'd like to use the seqpos slicing instead of the start and end pos offsets. Other than that this looks good I think!

Requests:

[] Please rebase or merge with Support seqpos slicing #294 (which will also make sure the seqpos slice ends up in the config)
[] Optional: Do you want to make a short tutorial on training / using the OthelloGPT SAE?

Once that's done I'll accept :)

chanind

If I userstand this correctly, these offset parameters only makes sense in cases where the model is given sequences of a fixed length, and context_size is set to that length, e.g. OthelloGPT?

chanind · 2024-10-03T20:05:38Z

sae_lens/training/activations_store.py


-        stacked_activations = torch.zeros((n_batches, n_context, 1, self.d_in))
+        # For some models, we might want to exclude some positions from the sequence to train on
+        training_context_slice = list(


This is inefficient to create a list of indices like this. It's better to create an actual slice object, like training_context_slice = slice(self.start_pos_offset, n_context - self.end_pos_offset). Alternative, you can probably just do something like the following directly when selecting the specific indices.

stacked_activations[:, :, 0] = layerwise_activations[self.hook_name][ :, start_pos:end_pos, self.hook_head_index ]

Hey David, thanks for your thorough review of our PR! As Joseph mentioned in his comment above, we've rebased on Callum's PR and added a few tests of our own. These problems are not present in the rebased code, but they are really helpful for @zhenningdavidliu and me! 🙏

sae_lens/training/activations_store.py

tests/unit/training/test_cache_activations_runner.py

sae_lens/training/activations_store.py

…han dataset lengths for tokenized datasets.

…e bug

…_size

…uences

…lSAERunnerConfig

tests for this

decandido · 2024-10-09T13:21:33Z

Requests:

[] Please rebase or merge with Support seqpos slicing #294 (which will also make sure the seqpos slice ends up in the config)

[] Optional: Do you want to make a short tutorial on training / using the OthelloGPT SAE?

Thanks @jbloomAus for reviewing our PR! We've rebased on #294 and removed the start and end pos offsets.

We also added a short script to train SAEs on othelloGPT (scripts/training_a_sparse_autoencoder_othelloGPT.py). Let us know if that's what you were looking for!

callummcdougall · 2024-10-11T09:35:08Z

Hey, thanks for all your awesome work here! Could I possibly bump on this PR? Just cause it would also be super useful for the final bits of the ARENA material to run smoothly (-:

chanind

LGTM! Will leave it to Joseph to merge

callummcdougall · 2024-10-11T14:54:29Z

Thanks! (I'm actually also realizing that the context length issue wasn't a part of this PR, it was an earlier one, and that was the main thing that was making things a bit clunky, so less important thank I thought - but would still be great!)

chanind reviewed Oct 3, 2024

View reviewed changes

zhenningdavidliu force-pushed the issues/issue-39-support-othellogpt branch from 7c333e8 to 1b39d82 Compare October 4, 2024 10:32

decandido marked this pull request as draft October 7, 2024 08:01

callummcdougall and others added 26 commits October 9, 2024 14:40

support seqpos slicing

0a53fbf

add basic tests, ensure it's in the SAE config

3ba222b

format

b54d188

fix tests

264a570

fix tests 2

48b92c5

fix: Changing the activations store to handle context sizes smaller t…

54d1105

…han dataset lengths for tokenized datasets.

fix: Found bug which allowed for negative context lengths. Removed th…

eb04a01

…e bug

Update pytest to test new logic for context size of tokenized dataset

cc43814

Reformat code to pass CI tests

0284000

Add warning for when context_size is smaller than the dataset context…

c12550f

…_size

feat: adding support for start and end position offsets for token seq…

59439bf

…uences

Add start_pos_offset and end_pos_offset to the SAERunnerConfig

ac7ed3b

Add tests for start_pos_offset and end_pos_offset in the LanguageMode…

560ae8a

…lSAERunnerConfig

feat: start and end position offset support for SAELens.

93ebea6

Add test for CacheActivationsRunnerConfig with start and end pos offset

340500f

Test cache activation runner wtih valid start and end pos offset

c436a4f

feat: Enabling loading of start and end pos offset from saes. Adding

bdbb585

tests for this

fix: Renaming variables and a test

7f3b76a

adds test for position offests for saes

755ba75

reformats files with black

d680041

Add start and end pos offset to the base sae dict

776fdd7

fix test for sae training runner config with position offsets

0625447

add a benchmark test to train an SAE on OthelloGPT

f7d6a38

Remove double import from typing

9f16ff2

change dead_feature_window to int

99ace75

remove print statements from test file

c0dc5bf

Rebase on seqpos tuple implementation and remove start/end pos offset

9130ff9

decandido force-pushed the issues/issue-39-support-othellogpt branch from 8539145 to 9130ff9 Compare October 9, 2024 12:41

Reword docstring for seqpos to be clearer.

125b275

Added script to train an SAE on othelloGPT

552eea6

decandido marked this pull request as ready for review October 9, 2024 13:22

chanind approved these changes Oct 11, 2024

View reviewed changes

chanind merged commit 7047f87 into decoderesearch:main Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 39 support othellogpt in SAELens #317

Issue 39 support othellogpt in SAELens #317

Uh oh!

decandido commented Oct 3, 2024 •

edited

Loading

Uh oh!

jbloomAus commented Oct 3, 2024

Uh oh!

chanind left a comment

Uh oh!

chanind Oct 3, 2024

Uh oh!

decandido Oct 9, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

decandido commented Oct 9, 2024

Uh oh!

callummcdougall commented Oct 11, 2024

Uh oh!

chanind left a comment

Uh oh!

callummcdougall commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Issue 39 support othellogpt in SAELens #317

Issue 39 support othellogpt in SAELens #317

Uh oh!

Conversation

decandido commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

Uh oh!

jbloomAus commented Oct 3, 2024

Uh oh!

chanind left a comment

Choose a reason for hiding this comment

Uh oh!

chanind Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

decandido Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

decandido commented Oct 9, 2024

Uh oh!

callummcdougall commented Oct 11, 2024

Uh oh!

chanind left a comment

Choose a reason for hiding this comment

Uh oh!

callummcdougall commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

decandido commented Oct 3, 2024 •

edited

Loading