Skip to content

Conversation

@decandido
Copy link
Contributor

@decandido decandido commented Oct 3, 2024

Description

Adding support for OthelloGPT. Adding logic to train on sequences with a start and end position offset similar to PR 294. Adding tests for the added logic and to benchmark the OthelloGPT SAETrainingRunner.

Fixes # Issue 39

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

  • I have run make check-ci to check format and linting. (you can run make format to format code if needed.)

@jbloomAus
Copy link
Contributor

Thanks @decandido and @zhenningdavidliu!

This has some overlap with #294. Specifically, I'd like to use the seqpos slicing instead of the start and end pos offsets. Other than that this looks good I think!

Requests:

  • [] Please rebase or merge with Support seqpos slicing #294 (which will also make sure the seqpos slice ends up in the config)
  • [] Optional: Do you want to make a short tutorial on training / using the OthelloGPT SAE?

Once that's done I'll accept :)

Copy link
Collaborator

@chanind chanind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I userstand this correctly, these offset parameters only makes sense in cases where the model is given sequences of a fixed length, and context_size is set to that length, e.g. OthelloGPT?


stacked_activations = torch.zeros((n_batches, n_context, 1, self.d_in))
# For some models, we might want to exclude some positions from the sequence to train on
training_context_slice = list(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inefficient to create a list of indices like this. It's better to create an actual slice object, like training_context_slice = slice(self.start_pos_offset, n_context - self.end_pos_offset). Alternative, you can probably just do something like the following directly when selecting the specific indices.

stacked_activations[:, :, 0] = layerwise_activations[self.hook_name][
      :, start_pos:end_pos, self.hook_head_index
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey David, thanks for your thorough review of our PR! As Joseph mentioned in his comment above, we've rebased on Callum's PR and added a few tests of our own. These problems are not present in the rebased code, but they are really helpful for @zhenningdavidliu and me! 🙏

@zhenningdavidliu zhenningdavidliu force-pushed the issues/issue-39-support-othellogpt branch from 7c333e8 to 1b39d82 Compare October 4, 2024 10:32
@decandido decandido marked this pull request as draft October 7, 2024 08:01
callummcdougall and others added 26 commits October 9, 2024 14:40
@decandido decandido force-pushed the issues/issue-39-support-othellogpt branch from 8539145 to 9130ff9 Compare October 9, 2024 12:41
@decandido
Copy link
Contributor Author

Requests:

  • [] Please rebase or merge with Support seqpos slicing #294 (which will also make sure the seqpos slice ends up in the config)
  • [] Optional: Do you want to make a short tutorial on training / using the OthelloGPT SAE?

Thanks @jbloomAus for reviewing our PR! We've rebased on #294 and removed the start and end pos offsets.

We also added a short script to train SAEs on othelloGPT (scripts/training_a_sparse_autoencoder_othelloGPT.py). Let us know if that's what you were looking for!

@decandido decandido marked this pull request as ready for review October 9, 2024 13:22
@callummcdougall
Copy link
Contributor

Hey, thanks for all your awesome work here! Could I possibly bump on this PR? Just cause it would also be super useful for the final bits of the ARENA material to run smoothly (-:

Copy link
Collaborator

@chanind chanind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Will leave it to Joseph to merge

@callummcdougall
Copy link
Contributor

Thanks! (I'm actually also realizing that the context length issue wasn't a part of this PR, it was an earlier one, and that was the main thing that was making things a bit clunky, so less important thank I thought - but would still be great!)

@chanind chanind merged commit 7047f87 into decoderesearch:main Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants